Pattern Discovery and Cancer Gene Identification in Integrated Cancer

Total Page:16

File Type:pdf, Size:1020Kb

Pattern Discovery and Cancer Gene Identification in Integrated Cancer Pattern discovery and cancer gene identification in integrated cancer genomic data Qianxing Moa,b, Sijian Wangc, Venkatraman E. Seshana, Adam B. Olshend, Nikolaus Schultze, Chris Sandere, R. Scott Powersf, Marc Ladanyig, and Ronglai Shena,1 aDepartment of Epidemiology and Biostatistics, eComputational Biology Program, and gDepartment of Pathology and Human Oncology and Pathogenesis Program, Memorial Sloan–Kettering Cancer Center, New York, NY 10065; bDepartment of Medicine and Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, TX 77030; cDepartment of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI 53792; dDepartment of Epidemiology and Biostatistics, University of California, San Francisco, CA 94107; and fCancer Genome Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11797 Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved December 19, 2012 (received for review May 27, 2012) Large-scale integrated cancer genome characterization efforts in- integrates the information to extract biological principles from the cluding the cancer genome atlas and the cancer cell line encyclo- massive amount of data to provide useful insights for advancing pedia have created unprecedented opportunities to study cancer diagnostic, prognostic, and therapeutic strategies. biology in the context of knowing the entire catalog of genetic In a previous publication (8), we proposed an integrative alterations. A clinically important challenge is to discover cancer clustering framework called iCluster. The method was recently subtypes and their molecular drivers in a comprehensive genetic used in a landmark study to predict novel breast cancer subtypes context. Curtis et al. [Nature (2012) 486(7403):346–352] has re- with distinct clinical outcomes (9), and it was found that the joint cently shown that integrative clustering of copy number and gene clustering of copy number and gene expression profiles resolved expression in 2,000 breast tumors reveals novel subgroups beyond the considerable heterogeneity of the expression-only subgroups. the classic expression subtypes that show distinct clinical outcomes. Other approaches on data integration that have emerged in re- To extend the scope of integrative analysis for the inclusion of cent years include generalized data decomposition methods (10, somatic mutation data by massively parallel sequencing, we pro- 11) and nonparametric Bayesian models (12). However, two pose a framework for joint modeling of discrete and continuous major challenges have not yet been fully addressed. First, the variables that arise from integrated genomic, epigenomic, and existing methods are not designed to include both discrete (e.g., transcriptomic profiling. The core idea is motivated by the hypoth- somatic mutation) and continuous variables, thus limiting the esis that diverse molecular phenotypes can be predicted by a set of ability to harness the full potential of large-scale integrated ge- orthogonal latent variables that represent distinct molecular driv- nomic datasets. In fact, most of the previous methods have fo- ers, and thus can reveal tumor subgroups of biological and clinical cused on integrating only copy number and gene expression. importance. Using the cancer cell line encyclopedia dataset, we A second challenge that has not been fully addressed lies in demonstrate our method can accurately group cell lines by their systematically distinguishing cancer genes that are reliable and cell-of-origin for several cancer types, and precisely pinpoint their constant features of a subtype from those that are less reliable. known and potential cancer driver genes. Our integrative analysis To address these challenges, we present a significant enhance- also demonstrates the power for revealing subgroups that are not ment of the iCluster method, which we call iCluster+. The en- lineage-dependent, but consist of different cancer types driven by a hanced method can perform pattern discovery that integrates common genetic alteration. Application of the cancer genome atlas diverse data types: binary (somatic mutation), categorical (copy colorectal cancer data reveals distinct integrated tumor subtypes, number gain, normal, loss), and continuous (gene expression) suggesting different genetic pathways in colon cancer progression. values. In this paper, we demonstrate the power of this method for integrating the full spectrum of cancer genomic data using the multivariate generalized linear model | multidimensional data | CCLE and TCGA colorectal cancer datasets. A key aspect of the penalized regression method is to use generalized linear regression for the formulation of a joint model, with respect to a common set of latent variables major goal of many cancer genome projects is to characterize that we propose represents distinct driving factors (molecular eti- A key genetic alterations in cancer and discover therapeutic ology and genetic pathways). Geometrically, these latent variables targets through comprehensive genomic profiling of the cancer form a set of “principal” coordinates that span a lower dimensional genome. The Cancer Genome Atlas (TCGA) studies have un- integrated subspace, and collectively capture the major biological veiled the genetic landscape of several cancer types by whole-ge- variations observed across cancer genomes. As a result, the latent nome and whole-exome sequencing, DNA copy number profiling, variable approach enables rigorous analysis of the integrated ge- promoter methylation profiling, and mRNA expression profiling nomic data, as we show in this report can reveal common themes in a large number of tumors (1–5). Complementary to the tumor that sort the tumors into distinct subgroups of biological and project, the Cancer Cell Line Encyclopedia (CCLE) (6) and the clinical importance. To identify genomic features that contribute Sanger cell line project (7) has cataloged a compilation of ge- most to the biological variation and thus have direct relevance for netic and molecular data in almost 1,000 human cancer cell lines, characterizing the molecular subgroups, we apply a penalized coupled with pharmacological profiles for a large panel of anti- cancer drugs. These large-scale integrative genomic efforts have STATISTICS been geared toward comprehensively cataloging individual ge- Author contributions: Q.M., S.W., V.E.S., A.B.O., N.S., C.S., R.S.P., M.L., and R.S. designed nomic alterations, analogous to a reverse-engineering process research; Q.M., S.W., V.E.S., and R.S. performed research; Q.M., N.S., and R.S. analyzed where thousands of individual cancer genomes are taken apart to data; and Q.M., S.W., V.E.S., A.B.O., N.S., C.S., R.S.P., M.L., and R.S. wrote the paper. shed light on common biological principles. Unfortunately, cancer The authors declare no conflict of interest. genomes exhibit considerable heterogeneity with abnormalities This article is a PNAS Direct Submission. occurring in different genes among different individuals, posing Freely available online through the PNAS open access option. a great challenge to identify those genes with functional impor- 1To whom correspondence should be addressed. E-mail: [email protected]. tance and therapeutic implications. Thus, there is a corresponding This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. SYSTEMS BIOLOGY need for a forward-engineering process that synthesizes and 1073/pnas.1208949110/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1208949110 PNAS | March 12, 2013 | vol. 110 | no. 11 | 4245–4250 Downloaded by guest on September 24, 2021 À Á likelihood approach (13) with lasso penalty terms (14, 15) to in- P xijt = 1 zi À Á = α + β z ; duce sparsity. The lasso regression allows us to pinpoint the subset log jt jt i of genomic features that have significant weights on the latent 1 − P xijt = 1 zi variables, which leads to enhanced interpretability and a more stable estimation of the latent variables. where and Pðxijt = 1jziÞ is the probability of gene j mutated in patient i given the value of the latent factor zi; αjt is an intercept Results term; and βjt is a length-k row vector of coefficients that de- iCluster+ Framework. iCluster+ integrates a diverse range of data termine the weights genomic variable j contributes to the types (Fig. 1). First, we introduce some notations. Let xijt denote latent variables. the genomic variable associated with the jth (j ∈ f1; ⋯; ptg) ge- If xijt is a multicategory variable (e.g., copy number states: loss/ nomic feature in the ith (i ∈ f1; ⋯; ng) sample of the tth normal/gain), we consider the following multilogit regression: (t ∈ f1; ⋯; mg) data type. A genomic feature can be either À Á À Á α + β z a protein-coding gene or non–gene-centric elements of interest exp jct jct i P xijt = c zi = P À Á; c = 1; ⋯; C; (genomic region, CpG sites, microRNA, etc.), depending on the C α + β z ℓ = 1 exp jℓt jℓt i data type. Let zi = ðzi1; ⋯; zikÞ′ be a column vector consisting of k unobserved latent variables. where fPðxijt = 1jziÞ; ⋯; Pðxijt = CjziÞg denote the probability of the The core idea is the following. We use a set of latent variables to states of the categorical variable (e.g., copy number loss, normal, represent k distinct driving factorsP (molecular drivers), which z α β p = p gain) given the value of i; jct is the intercept term; jct is a length-k predict the values of the original t t genomic variables, and row vector of regression coefficients for category c;andC is the collectively
Recommended publications
  • Influencers on Thyroid Cancer Onset: Molecular Genetic Basis
    G C A T T A C G G C A T genes Review Influencers on Thyroid Cancer Onset: Molecular Genetic Basis Berta Luzón-Toro 1,2, Raquel María Fernández 1,2, Leticia Villalba-Benito 1,2, Ana Torroglosa 1,2, Guillermo Antiñolo 1,2 and Salud Borrego 1,2,* 1 Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBIS), University Hospital Virgen del Rocío/CSIC/University of Seville, 41013 Seville, Spain; [email protected] (B.L.-T.); [email protected] (R.M.F.); [email protected] (L.V.-B.); [email protected] (A.T.); [email protected] (G.A.) 2 Centre for Biomedical Network Research on Rare Diseases (CIBERER), 41013 Seville, Spain * Correspondence: [email protected]; Tel.: +34-955-012641 Received: 3 September 2019; Accepted: 6 November 2019; Published: 8 November 2019 Abstract: Thyroid cancer, a cancerous tumor or growth located within the thyroid gland, is the most common endocrine cancer. It is one of the few cancers whereby incidence rates have increased in recent years. It occurs in all age groups, from children through to seniors. Most studies are focused on dissecting its genetic basis, since our current knowledge of the genetic background of the different forms of thyroid cancer is far from complete, which poses a challenge for diagnosis and prognosis of the disease. In this review, we describe prevailing advances and update our understanding of the molecular genetics of thyroid cancer, focusing on the main genes related with the pathology, including the different noncoding RNAs associated with the disease.
    [Show full text]
  • Genome-Wide Analysis of Host-Chromosome Binding Sites For
    Lu et al. Virology Journal 2010, 7:262 http://www.virologyj.com/content/7/1/262 RESEARCH Open Access Genome-wide analysis of host-chromosome binding sites for Epstein-Barr Virus Nuclear Antigen 1 (EBNA1) Fang Lu1, Priyankara Wikramasinghe1, Julie Norseen1,2, Kevin Tsai1, Pu Wang1, Louise Showe1, Ramana V Davuluri1, Paul M Lieberman1* Abstract The Epstein-Barr Virus (EBV) Nuclear Antigen 1 (EBNA1) protein is required for the establishment of EBV latent infection in proliferating B-lymphocytes. EBNA1 is a multifunctional DNA-binding protein that stimulates DNA replication at the viral origin of plasmid replication (OriP), regulates transcription of viral and cellular genes, and tethers the viral episome to the cellular chromosome. EBNA1 also provides a survival function to B-lymphocytes, potentially through its ability to alter cellular gene expression. To better understand these various functions of EBNA1, we performed a genome-wide analysis of the viral and cellular DNA sites associated with EBNA1 protein in a latently infected Burkitt lymphoma B-cell line. Chromatin-immunoprecipitation (ChIP) combined with massively parallel deep-sequencing (ChIP-Seq) was used to identify cellular sites bound by EBNA1. Sites identified by ChIP- Seq were validated by conventional real-time PCR, and ChIP-Seq provided quantitative, high-resolution detection of the known EBNA1 binding sites on the EBV genome at OriP and Qp. We identified at least one cluster of unusually high-affinity EBNA1 binding sites on chromosome 11, between the divergent FAM55 D and FAM55B genes. A con- sensus for all cellular EBNA1 binding sites is distinct from those derived from the known viral binding sites, sug- gesting that some of these sites are indirectly bound by EBNA1.
    [Show full text]
  • List of Genes Associated with Sudden Cardiac Death (Scdgseta) Gene
    List of genes associated with sudden cardiac death (SCDgseta) mRNA expression in normal human heart Entrez_I Gene symbol Gene name Uniprot ID Uniprot name fromb D GTEx BioGPS SAGE c d e ATP-binding cassette subfamily B ABCB1 P08183 MDR1_HUMAN 5243 √ √ member 1 ATP-binding cassette subfamily C ABCC9 O60706 ABCC9_HUMAN 10060 √ √ member 9 ACE Angiotensin I–converting enzyme P12821 ACE_HUMAN 1636 √ √ ACE2 Angiotensin I–converting enzyme 2 Q9BYF1 ACE2_HUMAN 59272 √ √ Acetylcholinesterase (Cartwright ACHE P22303 ACES_HUMAN 43 √ √ blood group) ACTC1 Actin, alpha, cardiac muscle 1 P68032 ACTC_HUMAN 70 √ √ ACTN2 Actinin alpha 2 P35609 ACTN2_HUMAN 88 √ √ √ ACTN4 Actinin alpha 4 O43707 ACTN4_HUMAN 81 √ √ √ ADRA2B Adrenoceptor alpha 2B P18089 ADA2B_HUMAN 151 √ √ AGT Angiotensinogen P01019 ANGT_HUMAN 183 √ √ √ AGTR1 Angiotensin II receptor type 1 P30556 AGTR1_HUMAN 185 √ √ AGTR2 Angiotensin II receptor type 2 P50052 AGTR2_HUMAN 186 √ √ AKAP9 A-kinase anchoring protein 9 Q99996 AKAP9_HUMAN 10142 √ √ √ ANK2/ANKB/ANKYRI Ankyrin 2 Q01484 ANK2_HUMAN 287 √ √ √ N B ANKRD1 Ankyrin repeat domain 1 Q15327 ANKR1_HUMAN 27063 √ √ √ ANKRD9 Ankyrin repeat domain 9 Q96BM1 ANKR9_HUMAN 122416 √ √ ARHGAP24 Rho GTPase–activating protein 24 Q8N264 RHG24_HUMAN 83478 √ √ ATPase Na+/K+–transporting ATP1B1 P05026 AT1B1_HUMAN 481 √ √ √ subunit beta 1 ATPase sarcoplasmic/endoplasmic ATP2A2 P16615 AT2A2_HUMAN 488 √ √ √ reticulum Ca2+ transporting 2 AZIN1 Antizyme inhibitor 1 O14977 AZIN1_HUMAN 51582 √ √ √ UDP-GlcNAc: betaGal B3GNT7 beta-1,3-N-acetylglucosaminyltransfe Q8NFL0
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • Supplemental Information Proximity Interactions Among Centrosome
    Current Biology, Volume 24 Supplemental Information Proximity Interactions among Centrosome Components Identify Regulators of Centriole Duplication Elif Nur Firat-Karalar, Navin Rauniyar, John R. Yates III, and Tim Stearns Figure S1 A Myc Streptavidin -tubulin Merge Myc Streptavidin -tubulin Merge BirA*-PLK4 BirA*-CEP63 BirA*- CEP192 BirA*- CEP152 - BirA*-CCDC67 BirA* CEP152 CPAP BirA*- B C Streptavidin PCM1 Merge Myc-BirA* -CEP63 PCM1 -tubulin Merge BirA*- CEP63 DMSO - BirA* CEP63 nocodazole BirA*- CCDC67 Figure S2 A GFP – + – + GFP-CEP152 + – + – Myc-CDK5RAP2 + + + + (225 kDa) Myc-CDK5RAP2 (216 kDa) GFP-CEP152 (27 kDa) GFP Input (5%) IP: GFP B GFP-CEP152 truncation proteins Inputs (5%) IP: GFP kDa 1-7481-10441-1290218-1654749-16541045-16541-7481-10441-1290218-1654749-16541045-1654 250- Myc-CDK5RAP2 150- 150- 100- 75- GFP-CEP152 Figure S3 A B CEP63 – – + – – + GFP CCDC14 KIAA0753 Centrosome + – – + – – GFP-CCDC14 CEP152 binding binding binding targeting – + – – + – GFP-KIAA0753 GFP-KIAA0753 (140 kDa) 1-496 N M C 150- 100- GFP-CCDC14 (115 kDa) 1-424 N M – 136-496 M C – 50- CEP63 (63 kDa) 1-135 N – 37- GFP (27 kDa) 136-424 M – kDa 425-496 C – – Inputs (2%) IP: GFP C GFP-CEP63 truncation proteins D GFP-CEP63 truncation proteins Inputs (5%) IP: GFP Inputs (5%) IP: GFP kDa kDa 1-135136-424425-4961-424136-496FL Ctl 1-135136-424425-4961-424136-496FL Ctl 1-135136-424425-4961-424136-496FL Ctl 1-135136-424425-4961-424136-496FL Ctl Myc- 150- Myc- 100- CCDC14 KIAA0753 100- 100- 75- 75- GFP- GFP- 50- CEP63 50- CEP63 37- 37- Figure S4 A siCtl
    [Show full text]
  • 4-6 Weeks Old Female C57BL/6 Mice Obtained from Jackson Labs Were Used for Cell Isolation
    Methods Mice: 4-6 weeks old female C57BL/6 mice obtained from Jackson labs were used for cell isolation. Female Foxp3-IRES-GFP reporter mice (1), backcrossed to B6/C57 background for 10 generations, were used for the isolation of naïve CD4 and naïve CD8 cells for the RNAseq experiments. The mice were housed in pathogen-free animal facility in the La Jolla Institute for Allergy and Immunology and were used according to protocols approved by the Institutional Animal Care and use Committee. Preparation of cells: Subsets of thymocytes were isolated by cell sorting as previously described (2), after cell surface staining using CD4 (GK1.5), CD8 (53-6.7), CD3ε (145- 2C11), CD24 (M1/69) (all from Biolegend). DP cells: CD4+CD8 int/hi; CD4 SP cells: CD4CD3 hi, CD24 int/lo; CD8 SP cells: CD8 int/hi CD4 CD3 hi, CD24 int/lo (Fig S2). Peripheral subsets were isolated after pooling spleen and lymph nodes. T cells were enriched by negative isolation using Dynabeads (Dynabeads untouched mouse T cells, 11413D, Invitrogen). After surface staining for CD4 (GK1.5), CD8 (53-6.7), CD62L (MEL-14), CD25 (PC61) and CD44 (IM7), naïve CD4+CD62L hiCD25-CD44lo and naïve CD8+CD62L hiCD25-CD44lo were obtained by sorting (BD FACS Aria). Additionally, for the RNAseq experiments, CD4 and CD8 naïve cells were isolated by sorting T cells from the Foxp3- IRES-GFP mice: CD4+CD62LhiCD25–CD44lo GFP(FOXP3)– and CD8+CD62LhiCD25– CD44lo GFP(FOXP3)– (antibodies were from Biolegend). In some cases, naïve CD4 cells were cultured in vitro under Th1 or Th2 polarizing conditions (3, 4).
    [Show full text]
  • UC San Diego Electronic Theses and Dissertations
    UC San Diego UC San Diego Electronic Theses and Dissertations Title Negative Correlation between SMARCAD1 and Histone Citrulline Protein Expression in Cancer and Non-cancerous Cell Lines Permalink https://escholarship.org/uc/item/5078x69h Author Zhao, Tianyi Publication Date 2016 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Negative Correlation between SMARCAD1 and Histone Citrulline Protein Expression in Normal and Cancer Cell Lines A Thesis submitted in partial satisfaction of the requirements of the degree Master of Science in Bioengineering by Tianyi Zhao Committee in charge: Professor Sheng Zhong, Chair Professor Prashant Gulab Ram Mali Professor Yingxiao Wang 2016 Copyright Tianyi Zhao, 2016 All rights reserved. The Thesis of Tianyi Zhao is approved and it is acceptable in quality and form for publication on microfilm and electronically: Chair University of California, San Diego 2016 iii Table of Contens Signature Page ..................................................................................................... iii Table of Contents ................................................................................................. iv List of Figures ....................................................................................................... v List of Tables ....................................................................................................... vii List of Graph ......................................................................................................
    [Show full text]
  • Genome-Wide Rnai Screening Identifies Human Proteins with A
    RESOURCES Genome-wide RNAi screening identifies human proteins with a regulatory function in the early secretory pathway Jeremy C. Simpson1,7, Brigitte Joggerst2, Vibor Laketa2, Fatima Verissimo2, Cihan Cetin2, Holger Erfle2,6, Mariana G. Bexiga1, Vasanth R. Singan1, Jean-Karim Hériché3, Beate Neumann3, Alvaro Mateos2, Jonathon Blake4, Stephanie Bechtel5, Vladimir Benes4, Stefan Wiemann5, Jan Ellenberg2,3 and Rainer Pepperkok2,7 The secretory pathway in mammalian cells has evolved to facilitate the transfer of cargo molecules to internal and cell surface membranes. Use of automated microscopy-based genome-wide RNA interference screens in cultured human cells allowed us to identify 554 proteins influencing secretion. Cloning, fluorescent-tagging and subcellular localization analysis of 179 of these proteins revealed that more than two-thirds localize to either the cytoplasm or membranes of the secretory and endocytic pathways. The depletion of 143 of them resulted in perturbations in the organization of the COPII and/or COPI vesicular coat complexes of the early secretory pathway, or the morphology of the Golgi complex. Network analyses revealed a so far unappreciated link between early secretory pathway function, small GTP-binding protein regulation, actin cytoskeleton organization and EGF-receptor-mediated signalling. This work provides an important resource for an integrative understanding of global cellular organization and regulation of the secretory pathway in mammalian cells. Within higher eukaryotic cells membrane traffic pathways connect the Extensive efforts over many years have revealed a significant number various membrane-bounded organelles, thereby ensuring that they of regulators associated with the secretory pathway. Early biochemical retain the correct complement of proteins and lipids to maintain approaches to identify individual machinery components have started cellular homeostasis.
    [Show full text]
  • Supplemental Information
    Supplemental information Dissection of the genomic structure of the miR-183/96/182 gene. Previously, we showed that the miR-183/96/182 cluster is an intergenic miRNA cluster, located in a ~60-kb interval between the genes encoding nuclear respiratory factor-1 (Nrf1) and ubiquitin-conjugating enzyme E2H (Ube2h) on mouse chr6qA3.3 (1). To start to uncover the genomic structure of the miR- 183/96/182 gene, we first studied genomic features around miR-183/96/182 in the UCSC genome browser (http://genome.UCSC.edu/), and identified two CpG islands 3.4-6.5 kb 5’ of pre-miR-183, the most 5’ miRNA of the cluster (Fig. 1A; Fig. S1 and Seq. S1). A cDNA clone, AK044220, located at 3.2-4.6 kb 5’ to pre-miR-183, encompasses the second CpG island (Fig. 1A; Fig. S1). We hypothesized that this cDNA clone was derived from 5’ exon(s) of the primary transcript of the miR-183/96/182 gene, as CpG islands are often associated with promoters (2). Supporting this hypothesis, multiple expressed sequences detected by gene-trap clones, including clone D016D06 (3, 4), were co-localized with the cDNA clone AK044220 (Fig. 1A; Fig. S1). Clone D016D06, deposited by the German GeneTrap Consortium (GGTC) (http://tikus.gsf.de) (3, 4), was derived from insertion of a retroviral construct, rFlpROSAβgeo in 129S2 ES cells (Fig. 1A and C). The rFlpROSAβgeo construct carries a promoterless reporter gene, the β−geo cassette - an in-frame fusion of the β-galactosidase and neomycin resistance (Neor) gene (5), with a splicing acceptor (SA) immediately upstream, and a polyA signal downstream of the β−geo cassette (Fig.
    [Show full text]
  • Cytoplasmic E2f4 Forms Organizing Centres for Initiation of Centriole Amplification During Multiciliogenesis
    ARTICLE Received 13 Feb 2017 | Accepted 8 May 2017 | Published 4 Jul 2017 DOI: 10.1038/ncomms15857 OPEN Cytoplasmic E2f4 forms organizing centres for initiation of centriole amplification during multiciliogenesis Munemasa Mori1, Renin Hazan2, Paul S. Danielian2, John E. Mahoney1,w, Huijun Li1, Jining Lu1, Emily S. Miller2, Xueliang Zhu3, Jacqueline A. Lees2 & Wellington V. Cardoso1 Abnormal development of multiciliated cells is a hallmark of a variety of human conditions associated with chronic airway diseases, hydrocephalus and infertility. Multiciliogenesis requires both activation of a specialized transcriptional program and assembly of cytoplasmic structures for large-scale centriole amplification that generates basal bodies. It remains unclear, however, what mechanism initiates formation of these multiprotein complexes in epithelial progenitors. Here we show that this is triggered by nucleocytoplasmic translocation of the transcription factor E2f4. After inducing a transcriptional program of centriole biogenesis, E2f4 forms apical cytoplasmic organizing centres for assembly and nucleation of deuterosomes. Using genetically altered mice and E2F4 mutant proteins we demonstrate that centriole amplification is crucially dependent on these organizing centres and that, without cytoplasmic E2f4, deuterosomes are not assembled, halting multiciliogenesis. Thus, E2f4 integrates nuclear and previously unsuspected cytoplasmic events of centriole amplification, providing new perspectives for the understanding of normal ciliogenesis, ciliopathies and cancer. 1 Columbia Center for Human Development, Department of Medicine, Pulmonary Allergy Critical Care, Columbia University Medical Center, New York City, New York 10032, USA. 2 David H. Koch Institute for Integrative Cancer Research, MIT, Cambridge, Massachusetts 02139, USA. 3 State Key Laboratory of Cell Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China.
    [Show full text]
  • Molecular Mechanism of the MORC4 Atpase Activation
    ARTICLE https://doi.org/10.1038/s41467-020-19278-8 OPEN Molecular mechanism of the MORC4 ATPase activation Adam H. Tencer1, Khan L. Cox2, Gregory M. Wright1, Yi Zhang 1, Christopher J. Petell3, Brianna J. Klein1, ✉ Brian D. Strahl 3, Joshua C. Black1, Michael G. Poirier2 & Tatiana G. Kutateladze 1 Human Microrchidia 4 (MORC4) is associated with acute and chronic pancreatitis, inflam- matory disorders and cancer but it remains largely uncharacterized. Here, we describe the 1234567890():,; structure–function relationship of MORC4 and define the molecular mechanism for MORC4 activation. Enzymatic and binding assays reveal that MORC4 has ATPase activity, which is dependent on DNA-binding functions of both the ATPase domain and CW domain of MORC4. The crystal structure of the ATPaseCW cassette of MORC4 and mutagenesis studies show that the DNA-binding site and the histone/ATPase binding site of CW are located on the opposite sides of the domain. The ATPase and CW domains cooperate in binding of MORC4 to the nucleosome core particle (NCP), enhancing the DNA wrapping around the histone core and impeding binding of DNA-associated proteins, such as tran- scription factors, to the NCP. In cells, MORC4 mediates formation of nuclear bodies in the nucleus and has a role in the progression of S-phase of the cell cycle, and both these functions require CW and catalytic activity of MORC4. Our findings highlight the mechanism for MORC4 activation, which is distinctly different from the mechanisms of action observed in other MORC family members. 1 Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO 80045, USA.
    [Show full text]
  • Supplementary Table S4. FGA Co-Expressed Gene List in LUAD
    Supplementary Table S4. FGA co-expressed gene list in LUAD tumors Symbol R Locus Description FGG 0.919 4q28 fibrinogen gamma chain FGL1 0.635 8p22 fibrinogen-like 1 SLC7A2 0.536 8p22 solute carrier family 7 (cationic amino acid transporter, y+ system), member 2 DUSP4 0.521 8p12-p11 dual specificity phosphatase 4 HAL 0.51 12q22-q24.1histidine ammonia-lyase PDE4D 0.499 5q12 phosphodiesterase 4D, cAMP-specific FURIN 0.497 15q26.1 furin (paired basic amino acid cleaving enzyme) CPS1 0.49 2q35 carbamoyl-phosphate synthase 1, mitochondrial TESC 0.478 12q24.22 tescalcin INHA 0.465 2q35 inhibin, alpha S100P 0.461 4p16 S100 calcium binding protein P VPS37A 0.447 8p22 vacuolar protein sorting 37 homolog A (S. cerevisiae) SLC16A14 0.447 2q36.3 solute carrier family 16, member 14 PPARGC1A 0.443 4p15.1 peroxisome proliferator-activated receptor gamma, coactivator 1 alpha SIK1 0.435 21q22.3 salt-inducible kinase 1 IRS2 0.434 13q34 insulin receptor substrate 2 RND1 0.433 12q12 Rho family GTPase 1 HGD 0.433 3q13.33 homogentisate 1,2-dioxygenase PTP4A1 0.432 6q12 protein tyrosine phosphatase type IVA, member 1 C8orf4 0.428 8p11.2 chromosome 8 open reading frame 4 DDC 0.427 7p12.2 dopa decarboxylase (aromatic L-amino acid decarboxylase) TACC2 0.427 10q26 transforming, acidic coiled-coil containing protein 2 MUC13 0.422 3q21.2 mucin 13, cell surface associated C5 0.412 9q33-q34 complement component 5 NR4A2 0.412 2q22-q23 nuclear receptor subfamily 4, group A, member 2 EYS 0.411 6q12 eyes shut homolog (Drosophila) GPX2 0.406 14q24.1 glutathione peroxidase
    [Show full text]