Population-Haplotype Models for Mapping and Tagging Structural Variation Using Whole Genome Sequencing

Total Page:16

File Type:pdf, Size:1020Kb

Population-Haplotype Models for Mapping and Tagging Structural Variation Using Whole Genome Sequencing Population-haplotype models for mapping and tagging structural variation using whole genome sequencing Eleni Loizidou Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy Section of Genomics of Common Disease Department of Medicine Imperial College London, 2018 1 Declaration of originality I hereby declare that the thesis submitted for a Doctor of Philosophy degree is based on my own work. Proper referencing is given to the organisations/cohorts I collaborated with during the project. 2 Copyright Declaration The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work 3 Abstract The scientific interest in copy number variation (CNV) is rapidly increasing, mainly due to the evidence of phenotypic effects and its contribution to disease susceptibility. Single nucleotide polymorphisms (SNPs) which are abundant in the human genome have been widely investigated in genome-wide association studies (GWAS). Despite the notable genomic effects both CNVs and SNPs have, the correlation between them has been relatively understudied. In the past decade, next generation sequencing (NGS) has been the leading high-throughput technology for investigating CNVs and offers mapping at a high-quality resolution. We created a map of NGS-defined CNVs tagged by SNPs using the 1000 Genomes Project phase 3 (1000G) sequencing data to examine patterns between the two types of variation in protein-coding genes. To investigate potential relationships between CNV-tagging SNPs and various phenotypes, we used SNPs reported for disease/phenotype associations from the GWAS catalog. Moreover, we applied our method to DIAGRAM consortium and Northern Finland Birth Cohort (NFBC) data. Our analysis replicated existing CNV-tagging SNPs but also revealed novel relationships between them in almost all the datasets we analysed. We have developed a statistical framework under a population perspective for a fast and accurate CNV detection. Using 202 drug-target genes defined in collaboration with GlaxoSmithKline (GSK), we applied our framework to the 1000G data. We calculated summary statistics based on the detected CNV calls including the allele frequency (AF) for each of the 26 populations of the 1000G. In addition, we visualised our results using UCSC genome browser visualisation tracks for all 202 regions and successfully benchmarked our CNV calls by comparing them to a gold standard set of the 1000G CNVs. Overall in this thesis, we present detailed maps of CNVs and CNV-tagging SNPs to enhance existing knowledge of their impact on human genome. 4 To my parents, my everything 5 Acknowledgments I would like to express my deepest gratitude to my supervisor, Dr. Inga Prokopenko, for her valuable advice, guidance and patience throughout my PhD journey. Her perfectionism and level of sophistication and experience were always a source of inspiration to me. Above all, I need to say thank you for teaching me how to be a true scientist and how to grow academically and scientifically. I am also deeply grateful to my co-supervisor Dr. Evangelos Bellos whose support and academic brilliance paved the way and made this long journey look shorter. His encouragement and calm reaction to every situation have been my source of power and strength. A thank you is not enough to express my appreciation for the faith you showed in me. I would also like to express my gratitude to my co-supervisors Prof. Michael Johnson and Dr. Lachlan J. M. Coin for their valuable advice and for accepting me in their research groups. Last but not least, I would like to truly thank Dr. Leonardo Bottolo. The person who initially believed in my abilities and is indirectly responsible for me being where I am today. My research project would have never been initiated and completed without the support of Medical Research Council (MRC) and GlaxoSmithKline (GSK). I would therefore like to express my strongest appreciation for them both. I owe special thanks to the Cyprus Institute of Neurology and Genetics and specifically to Prof. Kyproula Christodoulou and Dr. George Spyrou for generously hosting me at the Bioinformatics department in Cyprus for the last year of my PhD. Thank you for giving me the chance to attend the department’s conferences as a speaker and for treating me as part of your team. This long adventure seemed shorter with the support of my colleagues who I am lucky to say that I am now calling friends. Special thank you to Dr. Sadia Saeed, Dr. Marika Kaakinen, Dr. Amna Khamis, Charalambos Kkoufou, Mila Anasanti, Dr. Hutokshi Crouch, Abdullah Abdulshakur and Jani Heikkinen for the unforgettable moments, outings and unstoppable laughter. I would also like to thank Patricia Murphy for her assistance at several administrative issues since the first day of my PhD and for her support throughout the years. I am mostly 6 thankful for being a member of an international department which provided me with the opportunity to meet people from different cultures and mentality. Finally, since this thesis is the culmination of my PhD journey, I would like to say the biggest and warmest thank you to my family. My favourite people who have always been my driving force, my inspiration, my inner power. My sisters Antigoni and Marina, who are a gift from God to me and their love and support are always unconditional. Thank you for constantly being by my side to encourage me, even during periods of worry and frustration. My husband George, the man I am now sharing my life with and the person who proves every single day that true love exists. His patience during the three years we were living in a different place so I could fulfil my dreams has given me the strength to move on. Even though a few words and a thank you will never be enough, I will try to express my deepest gratitude to the two people I owe everything in life. My parents, who provided me with the greatest values. Just by being their selves, they taught me that the biggest achievement is to first be a good person and always treat others in the way you would like to be treated. They never stopped believing in me even when I doubted myself and supported my decisions no matter what. Through their actions, they proved that even with the greatest achievements, the important thing is to remain modest and keep working with passion and self-respect for the best outcome. Their passion for work and their love for humanity were the reasons I gained my scientific curiosity. Thank you for loving me unconditionally and giving me the “supplies” to be the person I am today and to have a successful future. 7 Table of Contents Abstract ................................................................................................................................................... 4 Acknowledgments ................................................................................................................................... 6 Table of Contents .................................................................................................................................... 8 List of figures ......................................................................................................................................... 11 List of tables .......................................................................................................................................... 12 List of Abbreviations ............................................................................................................................. 13 Chapter 1 ............................................................................................................................................... 15 Introduction .......................................................................................................................................... 15 1.1. Human genome ..................................................................................................................... 15 1.1.1 Human genome variation – Single Nucleotide Polymorphisms (SNPs) and Structural Variation (SV) ................................................................................................................................ 15 1.1.2. CNV description .................................................................................................................. 16 ...................................................................................................................................................... 18 1.2. Sequencing the human genome ........................................................................................... 19 1.2.1. Uncovering CNVs ........................................................................................................... 19 1.2.2. 1000 Genomes Sequencing Project (1000G) ...................................................................... 20 1.2.3. Data generated by 1000 Genomes Project ......................................................................... 21 1.2.4. Next generation DNA sequencing methods ........................................................................ 21 1.2.5. Whole-genome and whole-exome sequencing
Recommended publications
  • Genetic Analysis of Retinopathy in Type 1 Diabetes
    Genetic Analysis of Retinopathy in Type 1 Diabetes by Sayed Mohsen Hosseini A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Institute of Medical Science University of Toronto © Copyright by S. Mohsen Hosseini 2014 Genetic Analysis of Retinopathy in Type 1 Diabetes Sayed Mohsen Hosseini Doctor of Philosophy Institute of Medical Science University of Toronto 2014 Abstract Diabetic retinopathy (DR) is a leading cause of blindness worldwide. Several lines of evidence suggest a genetic contribution to the risk of DR; however, no genetic variant has shown convincing association with DR in genome-wide association studies (GWAS). To identify common polymorphisms associated with DR, meta-GWAS were performed in three type 1 diabetes cohorts of White subjects: Diabetes Complications and Control Trial (DCCT, n=1304), Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR, n=603) and Renin-Angiotensin System Study (RASS, n=239). Severe (SDR) and mild (MDR) retinopathy outcomes were defined based on repeated fundus photographs in each study graded for retinopathy severity on the Early Treatment Diabetic Retinopathy Study (ETDRS) scale. Multivariable models accounted for glycemia (measured by A1C), diabetes duration and other relevant covariates in the association analyses of additive genotypes with SDR and MDR. Fixed-effects meta- analysis was used to combine the results of GWAS performed separately in WESDR, ii RASS and subgroups of DCCT, defined by cohort and treatment group. Top association signals were prioritized for replication, based on previous supporting knowledge from the literature, followed by replication in three independent white T1D studies: Genesis-GeneDiab (n=502), Steno (n=936) and FinnDiane (n=2194).
    [Show full text]
  • Gene Family: Gene Duplication and Retrotransposon Insertion
    21 Bucentaur (Bcnt)1 Gene Family: Gene Duplication and Retrotransposon Insertion Shintaro Iwashita and Naoki Osada Iwaki Meisei University / National Institute of Genetics Japan 1. Introduction Members of multiple gene families in higher organisms allow for more refined cellular signaling networks and structural organization toward more stable physiological homeostasis. Gene duplication is one the most powerful ways of providing an opportunity to create a novel gene(s) because a novel function might be acquired without the loss of the original gene function (Ohno, 1970). Gene duplication can result from unequal crossing over by recombination, retroposition of cDNA, or whole-genome duplication. Furthermore, a replication-based mechanism of change in gene copy number has been proposed recently (Hastings et al., 2009). Gene duplication generated by retroposition is frequently accompanied by deleterious effects because the insertion of cDNA into the genome is nearly random or unlinks the original gene location resulting in an alteration of the original vital functions of the target genes. Thus retroelements such as transposable elements and endogenous retroviruses have been thought of as “selfish”. On the other hand, gene duplication caused by unequal crossing over generally results in tandem alignment, which less frequently disrupts the functions of other genes. Recent genome-wide studies have demonstrated that retroelements can definitely contribute to the creation of individual novel genes and the modulation of gene expression, which allows for the dynamic diversity of biological systems, such as placental evolution (Rawn & Cross, 2008). It is now recognized that tandem duplication and retroposition are among the key factors that initiate the creation of novel gene family members (Brosius, 2005; Sorek, 2007; Kaessmann, 2010).
    [Show full text]
  • Whole-Genome Microarray Detects Deletions and Loss of Heterozygosity of Chromosome 3 Occurring Exclusively in Metastasizing Uveal Melanoma
    Anatomy and Pathology Whole-Genome Microarray Detects Deletions and Loss of Heterozygosity of Chromosome 3 Occurring Exclusively in Metastasizing Uveal Melanoma Sarah L. Lake,1 Sarah E. Coupland,1 Azzam F. G. Taktak,2 and Bertil E. Damato3 PURPOSE. To detect deletions and loss of heterozygosity of disease is fatal in 92% of patients within 2 years of diagnosis. chromosome 3 in a rare subset of fatal, disomy 3 uveal mela- Clinical and histopathologic risk factors for UM metastasis noma (UM), undetectable by fluorescence in situ hybridization include large basal tumor diameter (LBD), ciliary body involve- (FISH). ment, epithelioid cytomorphology, extracellular matrix peri- ϩ ETHODS odic acid-Schiff-positive (PAS ) loops, and high mitotic M . Multiplex ligation-dependent probe amplification 3,4 5 (MLPA) with the P027 UM assay was performed on formalin- count. Prescher et al. showed that a nonrandom genetic fixed, paraffin-embedded (FFPE) whole tumor sections from 19 change, monosomy 3, correlates strongly with metastatic death, and the correlation has since been confirmed by several disomy 3 metastasizing UMs. Whole-genome microarray analy- 3,6–10 ses using a single-nucleotide polymorphism microarray (aSNP) groups. Consequently, fluorescence in situ hybridization were performed on frozen tissue samples from four fatal dis- (FISH) detection of chromosome 3 using a centromeric probe omy 3 metastasizing UMs and three disomy 3 tumors with Ͼ5 became routine practice for UM prognostication; however, 5% years’ metastasis-free survival. to 20% of disomy 3 UM patients unexpectedly develop metas- tases.11 Attempts have therefore been made to identify the RESULTS. Two metastasizing UMs that had been classified as minimal region(s) of deletion on chromosome 3.12–15 Despite disomy 3 by FISH analysis of a small tumor sample were found these studies, little progress has been made in defining the key on MLPA analysis to show monosomy 3.
    [Show full text]
  • Genome-Wide Association Identifies Candidate Genes That Influence The
    Genome-wide association identifies candidate genes that influence the human electroencephalogram Colin A. Hodgkinsona,1, Mary-Anne Enocha, Vibhuti Srivastavaa, Justine S. Cummins-Omana, Cherisse Ferriera, Polina Iarikovaa, Sriram Sankararamanb, Goli Yaminia, Qiaoping Yuana, Zhifeng Zhoua, Bernard Albaughc, Kenneth V. Whitea, Pei-Hong Shena, and David Goldmana aLaboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, Rockville, MD 20852; bComputer Science Department, University of California, Berkeley, CA 94720; and cCenter for Human Behavior Studies, Weatherford, OK 73096 Edited* by Raymond L. White, University of California, Emeryville, CA, and approved March 31, 2010 (received for review July 23, 2009) Complex psychiatric disorders are resistant to whole-genome reflects rhythmic electrical activity of the brain. EEG patterns analysis due to genetic and etiological heterogeneity. Variation in dynamically and quantitatively index cortical activation, cognitive resting electroencephalogram (EEG) is associated with common, function, and state of consciousness. EEG traits were among the complex psychiatric diseases including alcoholism, schizophrenia, original intermediate phenotypes in neuropsychiatry, having been and anxiety disorders, although not diagnostic for any of them. EEG first recorded in humans in 1924 by Hans Berger, who documented traits for an individual are stable, variable between individuals, and the α rhythm, seen maximally during states of relaxation with eyes moderately to highly heritable. Such intermediate phenotypes closed, and supplanted by faster β waves during mental activity. appear to be closer to underlying molecular processes than are EEG can be used clinically for the evaluation and differential di- clinical symptoms, and represent an alternative approach for the agnosis of epilepsy and sleep disorders, differentiation of en- identification of genetic variation that underlies complex psychiat- cephalopathy from catatonia, assessment of depth of anesthesia, ric disorders.
    [Show full text]
  • Analysis of Gene Expression Data for Gene Ontology
    ANALYSIS OF GENE EXPRESSION DATA FOR GENE ONTOLOGY BASED PROTEIN FUNCTION PREDICTION A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Robert Daniel Macholan May 2011 ANALYSIS OF GENE EXPRESSION DATA FOR GENE ONTOLOGY BASED PROTEIN FUNCTION PREDICTION Robert Daniel Macholan Thesis Approved: Accepted: _______________________________ _______________________________ Advisor Department Chair Dr. Zhong-Hui Duan Dr. Chien-Chung Chan _______________________________ _______________________________ Committee Member Dean of the College Dr. Chien-Chung Chan Dr. Chand K. Midha _______________________________ _______________________________ Committee Member Dean of the Graduate School Dr. Yingcai Xiao Dr. George R. Newkome _______________________________ Date ii ABSTRACT A tremendous increase in genomic data has encouraged biologists to turn to bioinformatics in order to assist in its interpretation and processing. One of the present challenges that need to be overcome in order to understand this data more completely is the development of a reliable method to accurately predict the function of a protein from its genomic information. This study focuses on developing an effective algorithm for protein function prediction. The algorithm is based on proteins that have similar expression patterns. The similarity of the expression data is determined using a novel measure, the slope matrix. The slope matrix introduces a normalized method for the comparison of expression levels throughout a proteome. The algorithm is tested using real microarray gene expression data. Their functions are characterized using gene ontology annotations. The results of the case study indicate the protein function prediction algorithm developed is comparable to the prediction algorithms that are based on the annotations of homologous proteins.
    [Show full text]
  • A Genomic Screen for Activators of the Antioxidant Response Element
    A genomic screen for activators of the antioxidant response element Yanxia Liu†, Jonathan T. Kern‡, John R. Walker§, Jeffrey A. Johnson‡, Peter G. Schultz§¶ʈ, and Hendrik Luesch†ʈ †Department of Medicinal Chemistry, University of Florida, 1600 Southwest Archer Road, Gainesville, FL 32610; ‡Department of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin, 777 Highland Avenue, Madison, WI 53705; §Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, CA 92121; and ¶Department of Chemistry, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037 Contributed by Peter G. Schultz, January 31, 2007 (sent for review December 13, 2006) The antioxidant response element (ARE) is a cis-acting regulatory activation, translocates into the nucleus and transcriptionally acti- enhancer element found in the 5؅ flanking region of many phase II vates ARE-dependent genes after recruiting Maf proteins (2). The detoxification enzymes. Up-regulation of ARE-dependent target upstream regulatory mechanisms by which ARE-activating signals genes is known to have neuroprotective effects; yet, the mechanism are linked to Nrf2 remain to be fully elucidated. It has been of activation is largely unknown. By screening an arrayed collection demonstrated that reactive sulfhydryl groups of Keap1 are sensors of Ϸ15,000 full-length expression cDNAs in the human neuroblas- for induction of phase II genes (13), leading to the proposal that the toma cell line IMR-32 with an ARE-luciferase reporter, we have Nrf2/Keap1 interaction represents a cytoplasmic sensor for oxida- identified several cDNAs not previously associated with ARE activa- tive stress. However, 1-phosphatidylinositol 3-kinase (PI3K), tion.
    [Show full text]
  • Mapping of Quantitative Trait Loci for Milk Yield Traits on Bovine Chromosome 5 in the Fleckvieh Cattle
    From the Department of Veterinary Sciences Faculty of Veterinary Medicine Ludwig-Maximilians-Universität München Arbeit angefertigt unter der Leitung von Univ. Prof. Dr. Dr. habil. Martin Förster Mapping of Quantitative Trait Loci for Milk Yield Traits on Bovine Chromosome 5 in the Fleckvieh Cattle Inaugural–Dissertation For the attainment of Doctor Degree in Veterinary Medicine From the Faculty of Veterinary Medicine of the Ludwig-Maximilians-Universität München by Ashraf Fathy Said Awad from Sharkia- Egypt München 2011 Gedruckt mit Genehmigung der Tierärztlichen Fakultät der Ludwig–Maximilians–Universität München Dekan: Univ. Prof. Dr. Braun Berichterstatter: Univ. Prof. Dr. Dr. habil Förster Korreferent: Univ. Prof. Dr. Mansfeld Tag der Promotion: 12. February 2011 This work is dedicated to My Parents, my wife and my lovely daughters; Sama, Shaza, Hana CONTENTS CONTENTS ABBREVIATION……………………………………………………………… IV CHAPTER 1: GENERAL INTRODUCTION……………………………….. 1 CHAPTER 2: REVIEW OF LITERATURE………………………………… 3 2.1. DNA Markers……………………………………………………….. 3 2.1.1. Microsatellites………………………………………………………….. 3 2.1.2. Single Nucleotide Polymorphism (SNPs)…………………………… 4 2.2. Mapping of Quantitative Trait Loci (QTL)…………………….. 5 2.2.1. QTL Mapping Designs………………………………………………... 6 2.2.1.1. Daughter Design………………………………………………... 6 2.2.1.2. Granddaughter Design………………………………………… 7 2.2.1.3. Complex Pedigree Design…………………………………….. 9 2.2.2. QTL Mapping Strategies……………………………………………… 10 2.2.2.1. Candidate Gene Approach……………………………………. 10 2.2.2.2. Genome Scan Approach……………………………………… 11 2.3. Principles of Linkage Mapping…………………………………. 12 2.4. QTL Fine Mapping………………………………………………… 14 2.4.1. Linkage Disequilibrium……………………………………………… 15 2.4.2. Combined Linkage Disequilibrium and Linkage (LDL) Mapping… 17 2.5. Identification of Candidate Genes……………………………… 18 2.6.
    [Show full text]
  • Role of Coiled-Coil Registry Shifts in Activation of Human Bicaudal D2 for Dynein Recruitment Upon Cargo-Binding
    bioRxiv preprint doi: https://doi.org/10.1101/638536; this version posted May 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Role of coiled-coil registry shifts in activation of human Bicaudal D2 for dynein recruitment upon cargo-binding Crystal R. Noell,1,4 Jia Ying Loh, 1,4 Erik W. Debler,2 Kyle M. Loftus,1 Heying Cui,1 Blaine B. Russ,1 Puja Goyal,1,* Sozanne R. Solmaz1,3,* 1Department of Chemistry, State University of New York at Binghamton, Binghamton, NY 13902 2Department of Biochemistry & Molecular Biology, Thomas Jefferson University, Philadelphia, PA 19107. 3Lead contact. 4First author equally contributed *To whom the correspondence should be addressed: Sozanne R Solmaz Department of Chemistry, State University of New York at Binghamton PO Box 6000 Binghamton NY 13902 [email protected] +1 607 777 2089 Puja Goyal Department of Chemistry, State University of New York at Binghamton PO Box 6000 Binghamton NY 13902 [email protected] +1 607 777 4308 1 bioRxiv preprint doi: https://doi.org/10.1101/638536; this version posted May 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Highlights • Stable, bona fide BicD2 coiled-coils with distinct registries can be formed. • We provide evidence that a human disease mutation causes a coiled-coil registry shift. • A coiled-coil registry shift could relieve BicD2-autoinhibition upon cargo-binding. • The ability to undergo registry shifts may be an inherent property of coiled-coils.
    [Show full text]
  • Supplemental Information
    Supplemental information Dissection of the genomic structure of the miR-183/96/182 gene. Previously, we showed that the miR-183/96/182 cluster is an intergenic miRNA cluster, located in a ~60-kb interval between the genes encoding nuclear respiratory factor-1 (Nrf1) and ubiquitin-conjugating enzyme E2H (Ube2h) on mouse chr6qA3.3 (1). To start to uncover the genomic structure of the miR- 183/96/182 gene, we first studied genomic features around miR-183/96/182 in the UCSC genome browser (http://genome.UCSC.edu/), and identified two CpG islands 3.4-6.5 kb 5’ of pre-miR-183, the most 5’ miRNA of the cluster (Fig. 1A; Fig. S1 and Seq. S1). A cDNA clone, AK044220, located at 3.2-4.6 kb 5’ to pre-miR-183, encompasses the second CpG island (Fig. 1A; Fig. S1). We hypothesized that this cDNA clone was derived from 5’ exon(s) of the primary transcript of the miR-183/96/182 gene, as CpG islands are often associated with promoters (2). Supporting this hypothesis, multiple expressed sequences detected by gene-trap clones, including clone D016D06 (3, 4), were co-localized with the cDNA clone AK044220 (Fig. 1A; Fig. S1). Clone D016D06, deposited by the German GeneTrap Consortium (GGTC) (http://tikus.gsf.de) (3, 4), was derived from insertion of a retroviral construct, rFlpROSAβgeo in 129S2 ES cells (Fig. 1A and C). The rFlpROSAβgeo construct carries a promoterless reporter gene, the β−geo cassette - an in-frame fusion of the β-galactosidase and neomycin resistance (Neor) gene (5), with a splicing acceptor (SA) immediately upstream, and a polyA signal downstream of the β−geo cassette (Fig.
    [Show full text]
  • Integrated Bioinformatics Analysis of Aberrantly-Methylated
    Shen et al. BMC Ophthalmology (2020) 20:119 https://doi.org/10.1186/s12886-020-01392-2 RESEARCH ARTICLE Open Access Integrated bioinformatics analysis of aberrantly-methylated differentially- expressed genes and pathways in age- related macular degeneration Yinchen Shen1,2†,MoLi3†, Kun Liu1,2, Xiaoyin Xu1,2, Shaopin Zhu1,2, Ning Wang1,2, Wenke Guo4, Qianqian Zhao4, Ping Lu4, Fudong Yu4 and Xun Xu1,2* Abstract Background: Age-related macular degeneration (AMD) represents the leading cause of visual impairment in the aging population. The goal of this study was to identify aberrantly-methylated, differentially-expressed genes (MDEGs) in AMD and explore the involved pathways via integrated bioinformatics analysis. Methods: Data from expression profile GSE29801 and methylation profile GSE102952 were obtained from the Gene Expression Omnibus database. We analyzed differentially-methylated genes and differentially-expressed genes using R software. Functional enrichment and protein–protein interaction (PPI) network analysis were performed using the R package and Search Tool for the Retrieval of Interacting Genes online database. Hub genes were identified using Cytoscape. Results: In total, 827 and 592 genes showed high and low expression, respectively, in GSE29801; 4117 hyper-methylated genes and 511 hypo-methylated genes were detected in GSE102952. Based on overlap, we categorized 153 genes as hyper-methylated, low-expression genes (Hyper-LGs) and 24 genes as hypo-methylated, high-expression genes (Hypo-HGs). Four Hyper-LGs (CKB, PPP3CA, TGFB2, SOCS2) overlapped with AMD risk genes in the Public Health Genomics and Precision Health Knowledge Base. KEGG pathway enrichment analysis indicated that Hypo-HGs were enriched in the calcium signaling pathway, whereas Hyper-LGs were enriched in sphingolipid metabolism.
    [Show full text]
  • Association of Gene Ontology Categories with Decay Rate for Hepg2 Experiments These Tables Show Details for All Gene Ontology Categories
    Supplementary Table 1: Association of Gene Ontology Categories with Decay Rate for HepG2 Experiments These tables show details for all Gene Ontology categories. Inferences for manual classification scheme shown at the bottom. Those categories used in Figure 1A are highlighted in bold. Standard Deviations are shown in parentheses. P-values less than 1E-20 are indicated with a "0". Rate r (hour^-1) Half-life < 2hr. Decay % GO Number Category Name Probe Sets Group Non-Group Distribution p-value In-Group Non-Group Representation p-value GO:0006350 transcription 1523 0.221 (0.009) 0.127 (0.002) FASTER 0 13.1 (0.4) 4.5 (0.1) OVER 0 GO:0006351 transcription, DNA-dependent 1498 0.220 (0.009) 0.127 (0.002) FASTER 0 13.0 (0.4) 4.5 (0.1) OVER 0 GO:0006355 regulation of transcription, DNA-dependent 1163 0.230 (0.011) 0.128 (0.002) FASTER 5.00E-21 14.2 (0.5) 4.6 (0.1) OVER 0 GO:0006366 transcription from Pol II promoter 845 0.225 (0.012) 0.130 (0.002) FASTER 1.88E-14 13.0 (0.5) 4.8 (0.1) OVER 0 GO:0006139 nucleobase, nucleoside, nucleotide and nucleic acid metabolism3004 0.173 (0.006) 0.127 (0.002) FASTER 1.28E-12 8.4 (0.2) 4.5 (0.1) OVER 0 GO:0006357 regulation of transcription from Pol II promoter 487 0.231 (0.016) 0.132 (0.002) FASTER 6.05E-10 13.5 (0.6) 4.9 (0.1) OVER 0 GO:0008283 cell proliferation 625 0.189 (0.014) 0.132 (0.002) FASTER 1.95E-05 10.1 (0.6) 5.0 (0.1) OVER 1.50E-20 GO:0006513 monoubiquitination 36 0.305 (0.049) 0.134 (0.002) FASTER 2.69E-04 25.4 (4.4) 5.1 (0.1) OVER 2.04E-06 GO:0007050 cell cycle arrest 57 0.311 (0.054) 0.133 (0.002)
    [Show full text]
  • Transposon Activation Mutagenesis As a Screening Tool for Identifying
    Chen et al. BMC Cancer 2013, 13:93 http://www.biomedcentral.com/1471-2407/13/93 RESEARCH ARTICLE Open Access Transposon activation mutagenesis as a screening tool for identifying resistance to cancer therapeutics Li Chen1,2*, Lynda Stuart2, Toshiro K Ohsumi3, Shawn Burgess4, Gaurav K Varshney4, Anahita Dastur1, Mark Borowsky3, Cyril Benes1, Adam Lacy-Hulbert2 and Emmett V Schmidt2 Abstract Background: The development of resistance to chemotherapies represents a significant barrier to successful cancer treatment. Resistance mechanisms are complex, can involve diverse and often unexpected cellular processes, and can vary with both the underlying genetic lesion and the origin or type of tumor. For these reasons developing experimental strategies that could be used to understand, identify and predict mechanisms of resistance in different malignant cells would be a major advance. Methods: Here we describe a gain-of-function forward genetic approach for identifying mechanisms of resistance. This approach uses a modified piggyBac transposon to generate libraries of mutagenized cells, each containing transposon insertions that randomly activate nearby gene expression. Genes of interest are identified using next- gen high-throughput sequencing and barcode multiplexing is used to reduce experimental cost. Results: Using this approach we successfully identify genes involved in paclitaxel resistance in a variety of cancer cell lines, including the multidrug transporter ABCB1, a previously identified major paclitaxel resistance gene. Analysis of co-occurring transposons integration sites in single cell clone allows for the identification of genes that might act cooperatively to produce drug resistance a level of information not accessible using RNAi or ORF expression screening approaches. Conclusion: We have developed a powerful pipeline to systematically discover drug resistance in mammalian cells in vitro.
    [Show full text]