Algorithms and Applications of Next-Generation DNA Sequencing

Total Page:16

File Type:pdf, Size:1020Kb

Algorithms and Applications of Next-Generation DNA Sequencing Algorithms and Applications of Next-Generation DNA Sequencing Chip-Seq, database of human variations, and analysis of mammary ductal carcinomas by Anthony Peter Fejes Bachelor of Science, Biochemistry (Hons. Co-op), University of Waterloo, 2000 Bachelor of Independent Studies, University of Waterloo, 2001 Master of Science, Microbiology & Immunology, The University of British Columbia, 2004 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Bioinformatics) The University Of British Columbia (Vancouver) April 2012 © Anthony Peter Fejes, 2012 Abstract Next Generation Sequencing (NGS) technologies enable Deoxyribonucleic Acid (DNA) or Ribonucleic Acid (RNA) sequencing to be done at volumes and speeds several orders of magnitude faster than Sanger (dideoxy termination) based methods and have enabled the development of novel experiment types that would not have been practical before the advent of the NGS-based machines. The dramatically increased throughput of these new protocols requires significant changes to the algorithms used to process and analyze the results. In this thesis, I present novel algorithms used for Chromatin Immunoprecipitation and Sequencing (ChIP-Seq) as well as the structures required and challenges faced for working with Single Nucleotide Variations (SNVs) across a large collection of samples, and finally, I present the results obtained when performing an NGS based analysis of eight mammary ductal carcinoma cell lines and four matched normal cell lines. ii Preface The work described in this thesis is based entirely upon research done at the Canada’s Michael Smith Genome Sciences Centre (BCGSC) in Dr. Steve J.M. Jones’ group by Anthony Fejes. Two exceptions to this statement are the work on the dicer1 gene and the Motif Identification for ChIP-Seq Analysis (MICSA) software package, both of which involved collaborative work for which Anthony Fejes was granted co-authorship on subsequent publications. Contributions for each collaboration are detailed below. Work on chapter 2 was done by Anthony Fejes, with code contributions by Timothee Cezard, and with the guidance of Drs. Gordon Robertson and Mikhail Bilenky. Code contributions consist of the Lander-Water algorithm (implemented by Dr. Bilenky and merged into the FindPeaks code repository by Timothee Cezard), as well as numerous bug fixes contributed by Timothee Cezard. The work in this chapter is, in part, published in an application note, written by Anthony Fejes: A.P. Fejes et al. “FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology”. In: Bioinformat- ics 24 (Aug. 2008), pp. 1729–1730 Background information and a literature review on Chromatin Immunoprecipi- tation and Sequencing (ChIP-Seq) was published in the textbook: A. P. Fejes and S. J. Jones. “Chip-Seq: Mapping of Protein-DNA Interac- tions”. In: Next-Generation Genome Sequencing: Towards Personalized Medicine. Ed. by Michal Janitz. Wiley, John & Sons, November 2008 iii Discussion of the MICSA software and extensions to the FindPeaks package in chapter 2 describes collaborative work directed by Valentina Boeva of the Institut Curie. Contributions to this publication included the completed FindPeaks 3.3 software, used as the basis for the MICSA algorithms, as well as support and consultations with the co-authors. V. Boeva et al. “De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis”. In: Nucleic Acids Res. 38 (June 2010), e126 Work on chapter 3 was done by Anthony Fejes, with contributions to in- sertion/deletion processing by Alireza Hadj Khodabakhshi. An He assisted by automating and importing the data sets into the database. The work in this chap- ter is, in part, published in an application note, written by Anthony Fejes with contributions by Alireza Hadj Khodabakhshi: A. P. Fejes et al. “Human variation database: an open-source database template for genomic discovery”. In: Bioinformatics 27 (Apr. 2011), pp. 1155– 1156 Details of the exact contributions of developers to the code base discussed in chapter 2 as well as chapter 3 can be obtained through the code repository at http://vancouvershortr.svn.sourceforge.net/viewvc/vancouvershortr. Discussion of the work on the dicer 1, ribonuclease type III (DICER1) gene was performed in collaboration with Alireza Moussavi in the Huntsman lab. Contri- butions to this work included customized searches to assist in the identification and classification of recurrent variations and access to the software and database described in chapter 3. The work described in chapter 3 has been published: A. Heravi-Moussavi et al. “Recurrent Somatic DICER1 Mutations in Nonep- ithelial Ovarian Cancers”. In: N Engl J Med (Dec. 2011) Work on chapter 4 was done by Anthony Fejes, using data generated at the B.C. Genome Sciences Centre, with the assistance of Richard Varhol (Alignment), Nina Theissen (Single Nucleotide Polymorphism (SNP)-calling) and Karen Mungal and Readman Chu (Ribonucleic Acid (RNA) Assembly). This work has not been iv published. Work on chapter 5 was done by Anthony Fejes (bioinformatics) and Steven Leach (wet lab work, including Sanger sequencing), with the exception of the screening of the panel of ductal carcinomas, which was organized by Sohrab Shah and Kane Tse, and analyzed by Anthony Fejes. This work has not been published. v Table of Contents Abstract . ii Preface . iii Table of Contents . vi List of Tables . xiii List of Figures . xv Glossary . xvii List of Acronyms . xix Acknowledgements . xxiii 1 Introduction . 1 1.1 Research Presented . 2 1.1.1 Research not Included . 2 1.1.2 Outline . 2 1.1.3 Goals of this Research . 4 1.2 Vancouver Short Read Analysis Package . 4 1.2.1 Open Source Bioinformatics . 5 1.2.2 Why do Open Source Bioinformatics? . 6 1.2.3 Libraries . 8 vi 1.2.4 Availability . 9 1.3 ChIP-Seq . 9 1.3.1 Background . 9 1.3.2 History of Chromatin Immunoprecipitation . 10 1.3.3 Medical Applications of ChIP-Seq . 22 1.3.4 Challenges . 23 1.3.5 Future Uses of ChIP-Seq . 25 1.3.6 FindPeaks . 27 1.4 Variation Database . 27 1.4.1 Variations . 28 1.4.2 Pipelines Producing SNP- and SNV-Calls . 31 1.4.3 Single Nucleotide Polymorphism Databases . 34 1.5 Relational Databases . 37 1.5.1 SQL Access . 37 1.5.2 Postgresql . 38 1.5.3 ODBC . 39 1.5.4 Keys and Indexing . 39 1.5.5 Hardware Performance . 40 1.5.6 Optimization . 43 1.5.7 Background on the Variation Database . 46 1.6 Breast Cancer . 48 1.6.1 Molecular Subtypes . 49 1.6.2 ATCC Mammary Ductal Carcinoma Cell lines . 50 1.6.3 Epstein-Barr/B-Cell Derived Matched Normals . 50 1.6.4 Research Done . 52 1.6.5 Recurrent Variations . 54 1.6.6 Purpose . 56 1.7 Notch Genes, Strawberry Notch and the Epidermal Growth Factor 56 1.7.1 Pathways . 56 1.7.2 Notch Signalling . 57 vii 2 ChIP-Seq . 59 2.1 FindPeaks . 59 2.2 Paired End Tag versus Single End Tag ChIP-Seq . 60 2.3 Read Length Modelling . 61 2.3.1 Native Lengths - No Extension . 61 2.3.2 Hard Extension . 62 2.3.3 Triangle Distribution . 63 2.3.4 Read Shifting . 64 2.4 Peak Calling . 64 2.4.1 Trimming Peaks . 65 2.4.2 Peak Separation . 66 2.5 False Discovery Rates and ChIP-Seq Controls . 68 2.5.1 Sources of Error . 68 2.5.2 Simulated Control - Monte Carlo . 70 2.5.3 Simulated Control - Lander-Waterman . 71 2.5.4 Minimal Biological Control - Null Immunoprecipitation Control . 71 2.5.5 Biological Control . 72 2.6 Comparing ChIP-Seq Experiments . 72 2.6.1 Normalization of ChIP-Seq Results . 72 2.6.2 Normalization by Equivalent Peaks . 74 2.6.3 Limitation of Normalization by Equivalent Peaks . 76 2.6.4 Statistics . 76 2.6.5 Post-Normalization Processing . 77 2.7 Analysis of Normalized Samples . 77 2.7.1 Comparison by Ratio - Method of Perpendicular Lines . 77 2.7.2 Comparison by Equivalent Areas - “Method of Hyperbolic Sections” . 80 2.8 Example - Extending FindPeaks . 82 2.8.1 Method . 83 viii 2.8.2 EWS-FLII . 86 2.9 Summary . 87 3 Variation Database . 88 3.1 SNVs and INDELs . 89 3.2 Methods . 89 3.2.1 Novel Functions . 89 3.2.2 Data . 90 3.2.3 Graphic Output . 90 3.2.4 Input Formats . 90 3.2.5 Library Information . 90 3.2.6 Variation Annotations . 91 3.3 Design . 92 3.3.1 Design Philosophies . 92 3.4 Modularity . 98 3.4.1 Database Application Programming Interface . 99 3.4.2 File Iterators . 100 3.4.3 User Interface . 102 3.5 Common Use-Cases and User Interactions . 104 3.5.1 Querying . 104 3.5.2 ExperimentalRecord . 105 3.5.3 Concordance . 114 3.5.4 Modifying the Application Programming Interface (API) and the User Interface (UI) . 114 3.5.5 Ensembl . 114 3.6 Applications Using the Variation Database . 116 3.6.1 Filtering Polymorphisms . 116 3.6.2 Filtering Recurrent Variations . 117 3.6.3 Filtering to Identify Cancer Drivers . 118 3.6.4 Variations Only Found in Cancer . 119 3.6.5 Variations Never Found in Cancer . 123 ix 3.6.6 RNA Editing . 123 3.6.7 Transition and Transversion Frequency . 125 3.6.8 Growth of the Database . ..
Recommended publications
  • Broad Poster Vivek
    A novel computational method for finding regions with copy number abnormalities in cancer cells Vivek, Manuel Garber, and Mike Zody Broad Institute of MIT and Harvard, Cambridge, MA, USA Introduction Results Cancer can result from the over expression of oncogenes, genes which control and regulate cell growth. Sometimes oncogenes increase in 1 2 3 activity due to a specific genetic mutation called a translocation (Fig 1). SMAD4 – a gene known to be deleted in pancreatic COX10 – a gene deleted in cytochrome c oxidase AK001392 – a hereditary prostate cancer protein This translocation allows the oncogene to remain as active as its paired carcinoma deficiency, known to be related to cell proliferation gene. Amplification of this mutation can occur, thereby creating the proper conditions for uncontrolled cell growth; consequently, each Results from Analysis Program Results from Analysis Program Results from Analysis Program component of the translocation will amplify in similar quantities. In this mutation, the chromosomal region containing the oncogene displaces to Region 1 Region 2 R2 Region 1 Region 2 R2 Region 1 Region 2 R2 a region on another chromosome containing a gene that is expressed Chr18:47044749-47311978 Chr17:13930739-14654741 0.499070821478475 Chr17:13930739-14654741 Chr18:26861790-27072166 0.47355172850856 Chr17:12542326-13930738 Chr8:1789292-1801984 0.406208680312004 frequently. Actual region containing gene Actual region containing gene Actual region containing gene chr18: 45,842,214 - 48,514,513 chr17: 13,966,862 - 14,068,461 chr17: 12,542,326 - 13,930,738 Fig 1. Two chromosomal regions (abcdef and ghijk) are translocating to create two new regions (abckl and ghijedf).
    [Show full text]
  • Association of Gene Ontology Categories with Decay Rate for Hepg2 Experiments These Tables Show Details for All Gene Ontology Categories
    Supplementary Table 1: Association of Gene Ontology Categories with Decay Rate for HepG2 Experiments These tables show details for all Gene Ontology categories. Inferences for manual classification scheme shown at the bottom. Those categories used in Figure 1A are highlighted in bold. Standard Deviations are shown in parentheses. P-values less than 1E-20 are indicated with a "0". Rate r (hour^-1) Half-life < 2hr. Decay % GO Number Category Name Probe Sets Group Non-Group Distribution p-value In-Group Non-Group Representation p-value GO:0006350 transcription 1523 0.221 (0.009) 0.127 (0.002) FASTER 0 13.1 (0.4) 4.5 (0.1) OVER 0 GO:0006351 transcription, DNA-dependent 1498 0.220 (0.009) 0.127 (0.002) FASTER 0 13.0 (0.4) 4.5 (0.1) OVER 0 GO:0006355 regulation of transcription, DNA-dependent 1163 0.230 (0.011) 0.128 (0.002) FASTER 5.00E-21 14.2 (0.5) 4.6 (0.1) OVER 0 GO:0006366 transcription from Pol II promoter 845 0.225 (0.012) 0.130 (0.002) FASTER 1.88E-14 13.0 (0.5) 4.8 (0.1) OVER 0 GO:0006139 nucleobase, nucleoside, nucleotide and nucleic acid metabolism3004 0.173 (0.006) 0.127 (0.002) FASTER 1.28E-12 8.4 (0.2) 4.5 (0.1) OVER 0 GO:0006357 regulation of transcription from Pol II promoter 487 0.231 (0.016) 0.132 (0.002) FASTER 6.05E-10 13.5 (0.6) 4.9 (0.1) OVER 0 GO:0008283 cell proliferation 625 0.189 (0.014) 0.132 (0.002) FASTER 1.95E-05 10.1 (0.6) 5.0 (0.1) OVER 1.50E-20 GO:0006513 monoubiquitination 36 0.305 (0.049) 0.134 (0.002) FASTER 2.69E-04 25.4 (4.4) 5.1 (0.1) OVER 2.04E-06 GO:0007050 cell cycle arrest 57 0.311 (0.054) 0.133 (0.002)
    [Show full text]
  • Knowledge Management Enviroments for High Throughput Biology
    Knowledge Management Enviroments for High Throughput Biology Abhey Shah A Thesis submitted for the degree of MPhil Biology Department University of York September 2007 Abstract With the growing complexity and scale of data sets in computational biology and chemoin- formatics, there is a need for novel knowledge processing tools and platforms. This thesis describes a newly developed knowledge processing platform that is different in its emphasis on architecture, flexibility, builtin facilities for datamining and easy cross platform usage. There exist thousands of bioinformatics and chemoinformatics databases, that are stored in many different forms with different access methods, this is a reflection of the range of data structures that make up complex biological and chemical data. Starting from a theoretical ba- sis, FCA (Formal Concept Analysis) an applied branch of lattice theory, is used in this thesis to develop a file system that automatically structures itself by it’s contents. The procedure of extracting concepts from data sets is examined. The system also finds appropriate labels for the discovered concepts by extracting data from ontological databases. A novel method for scaling non-binary data for use with the system is developed. Finally the future of integrative systems biology is discussed in the context of efficiently closed causal systems. Contents 1 Motivations and goals of the thesis 11 1.1 Conceptual frameworks . 11 1.2 Biological foundations . 12 1.2.1 Gene expression data . 13 1.2.2 Ontology . 14 1.3 Knowledge based computational environments . 15 1.3.1 Interfaces . 16 1.3.2 Databases and the character of biological data .
    [Show full text]
  • Aneuploidy: Using Genetic Instability to Preserve a Haploid Genome?
    Health Science Campus FINAL APPROVAL OF DISSERTATION Doctor of Philosophy in Biomedical Science (Cancer Biology) Aneuploidy: Using genetic instability to preserve a haploid genome? Submitted by: Ramona Ramdath In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biomedical Science Examination Committee Signature/Date Major Advisor: David Allison, M.D., Ph.D. Academic James Trempe, Ph.D. Advisory Committee: David Giovanucci, Ph.D. Randall Ruch, Ph.D. Ronald Mellgren, Ph.D. Senior Associate Dean College of Graduate Studies Michael S. Bisesi, Ph.D. Date of Defense: April 10, 2009 Aneuploidy: Using genetic instability to preserve a haploid genome? Ramona Ramdath University of Toledo, Health Science Campus 2009 Dedication I dedicate this dissertation to my grandfather who died of lung cancer two years ago, but who always instilled in us the value and importance of education. And to my mom and sister, both of whom have been pillars of support and stimulating conversations. To my sister, Rehanna, especially- I hope this inspires you to achieve all that you want to in life, academically and otherwise. ii Acknowledgements As we go through these academic journeys, there are so many along the way that make an impact not only on our work, but on our lives as well, and I would like to say a heartfelt thank you to all of those people: My Committee members- Dr. James Trempe, Dr. David Giovanucchi, Dr. Ronald Mellgren and Dr. Randall Ruch for their guidance, suggestions, support and confidence in me. My major advisor- Dr. David Allison, for his constructive criticism and positive reinforcement.
    [Show full text]
  • Tursiops Truncatus): ESTABLISHMENT of NOVEL MOLECULAR TOOLS to STUDY MARINE MAMMALS in CHANGING ENVIRONMENTS
    ALMA MATER STUDIORUM UNIVERSITÀ DEGLI STUDI DI BOLOGNA Facoltà di Scienze Matematiche Fisiche e Naturali Scuola di Dottorato in Scienze Biologiche, Biomediche e Biotecnologiche Dottorato di Ricerca in Biologia e Fisiologia Cellulare Ciclo XXII SSD: BIO/11 FUNCTIONAL GENOMICS AND CELL BIOLOGY OF THE DOLPHIN (Tursiops truncatus): ESTABLISHMENT OF NOVEL MOLECULAR TOOLS TO STUDY MARINE MAMMALS IN CHANGING ENVIRONMENTS Presentata da: Dott.ssa ANNALAURA MANCIA Coordinatore Dottorato: Relatore: Prof.ssa Michela Rugolo Prof.ssa Marialuisa Melli - Esame finale 2010 - ALMA MATER STUDIORUM UNIVERSITÀ DEGLI STUDI DI BOLOGNA Faculty of Science Physiology and Cellular Biology XXII PhD Program SSD: BIO/11 FUNCTIONAL GENOMICS AND CELL BIOLOGY OF THE DOLPHIN (Tursiops truncatus): ESTABLISHMENT OF NOVEL MOLECULAR TOOLS TO STUDY MARINE MAMMALS IN CHANGING ENVIRONMENTS PhD Student: Dr ANNALAURA MANCIA Program Coordinator: Supervisor: Michela Rugolo, PhD Marialuisa Melli, PhD - Final PhD exam 2010 - To my sister, Roberta, once again “Dolphins are ‘non-human persons’ who qualify for moral standing as individuals” Thomas White “Research is what I'm doing when I don't know what I'm doing” Wernher Von Braun ABSTRACT The dolphin (Tursiops truncatus) is a mammal that is adapted to life in a totally aquatic environment. Despite the popularity and even iconic status of the dolphin, our knowledge of its physiology, its unique adaptations and the effects on it of environmental stressors are limited. One approach to improve this limited understanding is the implementation of established cellular and molecular methods to provide sensitive and insightful information for dolphin biology. We initiated our studies with the analysis of wild dolphin peripheral blood leukocytes, which have the potential to be informative of the animal’s global immune status.
    [Show full text]
  • Effet De La Cryptorchidie Sur Le Transcriptome Testiculaire Humain
    MARIE EVE BERGERON EFFET DE LA CRYPTORCHIDIE SUR LE TRANSCRIPTOME TESTICULAIRE HUMAIN Mémoire présenté à la Faculté des études supérieures et postdoctorales de l’Université Laval dans le cadre du programme de maîtrise en Physiologie-Endocrinologie pour l’obtention du grade de Maître ès sciences (M.Sc.) DÉPARTEMENT D’OBSTÉTRIQUE ET DE GYNÉCOLOGIE FACULTÉ DE MÉDECINE UNIVERSITÉ LAVAL QUÉBEC 2012 © Marie Eve Bergeron, 2012 Résumé Les niveaux d’expression de nombreux gènes peuvent être affectés par l’environnement et mener au développement de la cryptorchidie. Cette malformation congénitale est la plus commune dont une des conséquences majeures est l’infertilité masculine due au testicule non-descendu, auquel un risque plus élevé de cancer testiculaire est associé. L’expression des ARN totaux isolés à partir de biopsies testiculaires ont été analysés par micropuces, puis par une analyse bio-informatique et une validation par RT-qPCR de plusieurs gènes sélectionnés. Ces analyses m’ont permis d’identifier plus de deux milles candidats montrant une expression différente entre des sujets cryptorchides et normaux. Certains de ces gènes sélectionnés peuvent être associés à la descente testiculaire, d’autres au cancer testiculaire ou encore aux divers types cellulaires retrouvés dans cet organe. Les différences dans le transcriptome dues à la cryptorchidie vont nous aider à comprendre la cause génétique de cette maladie. ii Abstract Expression level of numerous genes may be affected by environmental condition and lead to development of cryptorchidism. The most common congenital malformation in male is cryptorchidism. One major consequence of this anomaly is infertility due to undescended testis, to which an increased risk of testicular cancer is associated.
    [Show full text]
  • A Dissertation Entitled the Androgen Receptor
    A Dissertation entitled The Androgen Receptor as a Transcriptional Co-activator: Implications in the Growth and Progression of Prostate Cancer By Mesfin Gonit Submitted to the Graduate Faculty as partial fulfillment of the requirements for the PhD Degree in Biomedical science Dr. Manohar Ratnam, Committee Chair Dr. Lirim Shemshedini, Committee Member Dr. Robert Trumbly, Committee Member Dr. Edwin Sanchez, Committee Member Dr. Beata Lecka -Czernik, Committee Member Dr. Patricia R. Komuniecki, Dean College of Graduate Studies The University of Toledo August 2011 Copyright 2011, Mesfin Gonit This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of The Androgen Receptor as a Transcriptional Co-activator: Implications in the Growth and Progression of Prostate Cancer By Mesfin Gonit As partial fulfillment of the requirements for the PhD Degree in Biomedical science The University of Toledo August 2011 Prostate cancer depends on the androgen receptor (AR) for growth and survival even in the absence of androgen. In the classical models of gene activation by AR, ligand activated AR signals through binding to the androgen response elements (AREs) in the target gene promoter/enhancer. In the present study the role of AREs in the androgen- independent transcriptional signaling was investigated using LP50 cells, derived from parental LNCaP cells through extended passage in vitro. LP50 cells reflected the signature gene overexpression profile of advanced clinical prostate tumors. The growth of LP50 cells was profoundly dependent on nuclear localized AR but was independent of androgen. Nevertheless, in these cells AR was unable to bind to AREs in the absence of androgen.
    [Show full text]
  • Rare Copy Number Variants Disrupt Genes Regulating Vascular Smooth Muscle Cell Adhesion and Contractility in Sporadic Thoracic Aortic Aneurysms and Dissections
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector ARTICLE Rare Copy Number Variants Disrupt Genes Regulating Vascular Smooth Muscle Cell Adhesion and Contractility in Sporadic Thoracic Aortic Aneurysms and Dissections Siddharth K. Prakash,1 Scott A. LeMaire,2,3 Dong-Chuan Guo,4 Ludivine Russell,2 Ellen S. Regalado,4 Hossein Golabbakhsh,4 Ralph J. Johnson,4 Hazim J. Safi,5 Anthony L. Estrera,5 Joseph S. Coselli,2,3 Molly S. Bray,1 Suzanne M. Leal,1 Dianna M. Milewicz,4 and John W. Belmont1,* Thoracic aortic aneurysms and dissections (TAAD) cause significant morbidity and mortality, but the genetic origins of TAAD remain largely unknown. In a genome-wide analysis of 418 sporadic TAAD cases, we identified 47 copy number variant (CNV) regions that were enriched in or unique to TAAD patients compared to population controls. Gene ontology, expression profiling, and network anal- ysis showed that genes within TAAD CNVs regulate smooth muscle cell adhesion or contractility and interact with the smooth muscle- specific isoforms of a-actin and b-myosin, which are known to cause familial TAAD when altered. Enrichment of these gene functions in rare CNVs was replicated in independent cohorts with sporadic TAAD (STAAD, n ¼ 387) and inherited TAAD (FTAAD, n ¼ 88). The over- all prevalence of rare CNVs (23%) was significantly increased in FTAAD compared with STAAD patients (Fisher’s exact test, p ¼ 0.03). Our findings suggest that rare CNVs disrupting smooth muscle adhesion or contraction contribute to both sporadic and familial disease.
    [Show full text]
  • Supplemental Figure S1 Differentially Methylated Regions (Dmrs
    Supplemental Figure S1 '$$#0#,2'**7+#2&7*2#"0#%'-,11 #25##,"'1#1#122#1 '!2-0'*"#.'!2'-,-$122&#20,1'2'-,$0-+2- !"Q !"2-$%," $ 31',% 25-$-*" !&,%# ," ' 0RTRW 1 !32V-$$ !0'2#0'T - #.0#1#,22'-, -$ "'$$#0#,2'**7+#2&7*2#"%#,#11',.0#,2&#1#1,"2&#'0 #&4'-022&#20,1'2'-, #25##,"'$$#0#,2"'1#1#122#1T-*!)00-51',"'!2#&7.#0+#2&7*2#"%#,#1Q%0700-51 &7.-+#2&7*2#"%#,#1Q31',%25-$-*"!&,%#,"'0RTRW1!32V-$$!0'2#0'T-%#,#1 +#22&# -4#!0'2#0'22&#20,1'2'-,$0-+$%2-$Q5#2&#0#$-0#*1-',!*3"#" %#,#15'2&V4*3#0RTRWT$$#!2#"%#,10#&'%&*'%&2#" 712#0'1)1#T Supplemental Figure S2 Validation of results from the HELP assay using Epityper MassarrayT #13*21 $0-+ 2&# 1$ 117 5#0# !-00#*2#" 5'2& /3,2'22'4# +#2&7*2'-, ,*78#" 7 '13*$'2#11007$-04V-,"6U-%#,#.0-+-2#00#%'-,1T11007 51.#0$-0+#"31',%**4'* *#1+.*#1T S Supplemental Fig. S1 A unique hypermethylated genes (methylation sites) 454 (481) 5693 (6747) 120 (122) NLMGUS NEWMM REL 2963 (3207) 1338 (1560) 5 (5) unique hypomethylated genes (methylation sites) B NEWMM 0 (0) MGUS 454 (481) 0 (0) NEWMM REL NL 3* (2) 2472 (3066) NEWMM 2963 REL (3207) 2* (2) MGUS 0 (0) REL 2 (2) NEWMM 0 (0) REL Supplemental Fig. S2 A B ARID4B DNMT3A Methylation by MassArray Methylation by MassArray 0 0.2 0.4 0.6 0.8 1 1.2 0.5 0.6 0.7 0.8 0.9 1 2 0 NL PC MGUS 1.5 -0.5 NEW MM 1 REL MM -1 0.5 -1.5 0 -2 -0.5 -1 -2.5 -1.5 -3 Methylation by HELP Assay Methylation by HELP Methylation by HELP Assay Methylation by HELP -2 -3.5 -2.5 -4 Supplemental tables "3..*#+#,2*6 *#"SS 9*','!*!&0!2#0'12'!1-$.2'#,21+.*#1 DZ_STAGE Age Gender Ethnicity MM isotype PCLI Cytogenetics
    [Show full text]
  • Original Article a Database and Functional Annotation of NF-Κb Target Genes
    Int J Clin Exp Med 2016;9(5):7986-7995 www.ijcem.com /ISSN:1940-5901/IJCEM0019172 Original Article A database and functional annotation of NF-κB target genes Yang Yang, Jian Wu, Jinke Wang The State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People’s Republic of China Received November 4, 2015; Accepted February 10, 2016; Epub May 15, 2016; Published May 30, 2016 Abstract: Backgrounds: The previous studies show that the transcription factor NF-κB always be induced by many inducers, and can regulate the expressions of many genes. The aim of the present study is to explore the database and functional annotation of NF-κB target genes. Methods: In this study, we manually collected the most complete listing of all NF-κB target genes identified to date, including the NF-κB microRNA target genes and built the database of NF-κB target genes with the detailed information of each target gene and annotated it by DAVID tools. Results: The NF-κB target genes database was established (http://tfdb.seu.edu.cn/nfkb/). The collected data confirmed that NF-κB maintains multitudinous biological functions and possesses the considerable complexity and diversity in regulation the expression of corresponding target genes set. The data showed that the NF-κB was a central regula- tor of the stress response, immune response and cellular metabolic processes. NF-κB involved in bone disease, immunological disease and cardiovascular disease, various cancers and nervous disease. NF-κB can modulate the expression activity of other transcriptional factors. Inhibition of IKK and IκBα phosphorylation, the decrease of nuclear translocation of p65 and the reduction of intracellular glutathione level determined the up-regulation or down-regulation of expression of NF-κB target genes.
    [Show full text]
  • Ana Isabel Borges Ferraz
    FACULDADE DE MEDICINA DA UNIVERSIDADE DE COIMBRA TRABALHO FINAL DO 6º ANO MÉDICO COM VISTA À ATRIBUIÇÃO DO GRAU DE MESTRE NO ÂMBITO DO CICLO DE ESTUDOS DE MESTRADO INTEGRADO EM MEDICINA ANA ISABEL BORGES FERRAZ CLINICAL RELEVANCE OF COPY NUMBER VARIATIONS DETECTED BY ARRAY-CGH IN SIX PATIENTS WITH UNEXPLAINED NON- SYNDROMIC INTELLECTUAL DISABILITY ARTIGO CIENTÍFICO ÁREA CIENTÍFICA DE PEDIATRIA TRABALHO REALIZADO SOB A ORIENTAÇÃO DE: PROFESSORA DRª GUIOMAR OLIVEIRA PROFESSORA DRª PATRÍCIA MACIEL MARÇO/2012 CLINICAL RELEVANCE OF COPY NUMBER VARIATIONS DETECTED BY ARRAY-CGH IN SIX PATIENTS WITH UNEXPLAINED NON-SYNDROMIC INTELLECTUAL DISABILITY FMUC CLINICAL RELEVANCE OF COPY NUMBER VARIATIONS DETECTED BY ARRAY-CGH IN SIX PATIENTS WITH UNEXPLAINED NON-SYNDROMIC INTELLECTUAL DISABILITY Ana Isabel Borges Ferraz Mestrado Integrado em Medicina - 6º ano Faculdade de Medicina da Universidade de Coimbra Morada: Torre de Vilela, 3020-928 Coimbra E-mail: [email protected] 2 Ana Isabel Borges Ferraz MarCh 2012 CLINICAL RELEVANCE OF COPY NUMBER VARIATIONS DETECTED BY ARRAY-CGH IN SIX PATIENTS WITH UNEXPLAINED NON-SYNDROMIC INTELLECTUAL DISABILITY Ana Ferraz 1,2, G. Oliveira 1, P. Maciel 2 1. Carmona da Mota Pediatric Hospital of Coimbra 2. Life and Health Sciences Research Institute (ICVS), Minho´s University Keywords: Intellectual disability, array-CGH, copy number variation, genotype, phenotype, non-syndromic ABSTRACT Intellectual disability (ID) represents a health problem of great relevance for the public health services and for the families and is one of the most common neurodevelopmental disorders, affecting 1 to 3% of children. Epidemiological studies show that genetic mutations contribute in about 15% to the etiology in milder forms.
    [Show full text]
  • Supplementary Table 1. Mutated Genes That Contain Protein Domains Identified Through Mutation Enrichment Analysis
    Supplementary Table 1. Mutated genes that contain protein domains identified through mutation enrichment analysis A. Breast cancers InterPro ID Mutated genes (number of mutations) IPR000219 ARHGEF4(2), ECT2(1), FARP1(1), FLJ20184(1), MCF2L2(1), NET1(1), OBSCN(5), RASGRF2(2), TRAD(1), VAV3(1) IPR000225 APC2(2), JUP(1), KPNA5(2), SPAG6(1) IPR000357 ARFGEF2(2), CMYA4(1), DRIM(2), JUP(1), KPNA5(2), PIK3R4(1), SPAG6(1) IPR000533 AKAP9(2), C10orf39(1), C20orf23(1), CUTL1(1), HOOK1(1), HOOK3(1), KTN1(2), LRRFIP1(3), MYH1(3), MYH9(2), NEF3(1), NF2(1), RSN(1), TAX1BP1(1), TPM4(1) IPR000694 ADAM12(3), ADAMTS19(1), APC2(2), APXL(1), ARID1B(1), BAT2(2), BAT3(1), BCAR1(1), BCL11A(2), BCORL1(1), C14orf155(3), C1orf2(1), C1QB(1), C6orf31(1), C7orf11(1), CD2(1), CENTD3(3), CHD5(3), CIC(3), CMYA1(2), COL11A1(3), COL19A1(2), COL7A1(3), DAZAP1(1), DBN1(3), DVL3(1), EIF5(1), FAM44A(1), FAM47B(1), FHOD1(1), FLJ20584(1), G3BP2(2), GAB1(2), GGA3(1), GLI1(3), GPNMB(2), GRIN2D(3), HCN3(1), HOXA3(2), HOXA4(1), IRS4(1), KCNA5(1), KCNC2(1), LIP8(1), LOC374955(1), MAGEE1(2), MICAL1(2), MICAL‐L1(1), MLLT2(1), MMP15(1), N4BP2(1), NCOA6(2), NHS(1), NUP214(3), ODZ1(3), PER1(2), PER2(1), PHC1(1), PLXNB1(1), PPM1E(2), RAI17(2), RAPH1(2), RBAF600(2), SCARF2(1), SEMA4G(1), SLC16A2(1), SORBS1(1), SPEN(2), SPG4(1), TBX1(1), TCF1(2), TCF7L1(1), TESK1(1), THG‐1(1), TP53(18), TRIF(1), ZBTB3(2), ZNF318(2) IPR000909 CENTB1(2), PLCB1(1), PLCG1(1) IPR000998 AEGP(3), EGFL6(2), PRSS7(1) IPR001140 ABCB10(2), ABCB6(1), ABCB8(2) IPR001164 ARFGAP3(1), CENTB1(2), CENTD3(3), CENTG1(2) IPR001589
    [Show full text]