A Cloud-Based Platform for Interactive Visualization and Integrative Analysis of Genomics and Clinical Data

Total Page:16

File Type:pdf, Size:1020Kb

A Cloud-Based Platform for Interactive Visualization and Integrative Analysis of Genomics and Clinical Data GenePool: A Cloud-Based Platform for Interactive Visualization and Integrative Analysis of Genomics and Clinical Data Hua Fan-Minogue, MD, PhD1*, Marina Sirota, PhD1*, Sandeep Sanga, PhD2*, Dexter Hadley, MD, PhD1, Atul J. Butte, MD, PhD1, Tod Klingler, PhD2 1Division of System Medicine, Department of Pediatrics, Stanford School of Medicine, Stanford, CA; 2Station X Inc, San Francisco, CA Abstract Advances in genomic technology harnessed by large-scale efforts from interdisciplinary consortia have generated more comprehensive genomic data and provided unprecedented opportunities for understanding human health and disease. The Cancer Genome Atlas (TCGA) in particular contains comprehensive clinical, genomics and transcriptomics measurements across over 16,000 of patient samples and over 30 tumor types. However, the vast volume and complexity of these data has also brought challenges to biomedical research community for translating them into insights about diseases quickly and reliably. Here we present a cloud-based platform, GenePool, which allows rapid interrogation of multi-genome data and their associated metadata from public or private datasets in a secure and interactive way. We demonstrate the functionality of GenePool with two use cases highlighting interactive sample browsing and selection, comparative analysis between disease subtypes and integrated analysis of genomic, proteomic and phenotypic data. These examples also display the promise of GenePool to accelerate the identification of genetic contributions to human disease and the translation of these findings into clinically actionable results. Introduction Since the completion of the Human Genome Project, tremendous progress in the technology has enabled the generation of genomic data in a remarkably high throughput and cost-effective manner, which is strikingly illustrated by the next-generation sequencing technologies (1). The resulting unprecedented level of sequence data has facilitated the identification of genes associated with disease or drug response, some of which have led to improved therapies (2). Large interdisciplinary consortia have played a major role in harnessing these advances and building comprehensive catalogues of genomic data. For example, The Cancer Genome Atlas (TCGA) characterizes cancer transcriptomics and genomics data along with phenotypic data (3), the 1000 Genome Project focuses in depth on the human genetic variation (www.1000genomes.org), and The Genotype-Tissue Expression (GTEx) program aims to identify the correlations between genetic variation and gene expression in multiple tissues (commonfund.nih.gov/GTEx). Various types of data, including clinical, genetic variation, mRNA expression, methylation, copy number variation, and others have been generated and released as resources to the community. The rapid growth in generation of these sorts of large-scale genomics efforts brings new challenges to biomedical research. The raw and processed data often reside in the repository of each individual consortium, the access of which needs manual downloading through individual data portal. This also leaves users to concern about the compatible computing power and data storage capacities. Furthermore, analysis of these data requires specialized bioinformatics knowledge about the computing algorithms and statistical methods appropriate to genomic data, which is not readily available in most biomedical research labs (4). Therefore, data management, query and analysis become new challenges for efficiently translating the abundance of data into knowledge about disease. Some solutions are available that enable easy access to the data. For example, the data portal for the International Cancer Genome Consortium (ICGC) allows quickly browsing or query of genomics data from cancer projects, including TCGA (5). COSMIC (catalogue of somatic mutations in cancer) (6) and cBioPortal (7), provides easy access to curated comprehensive information on cancer genomics data. However, these data portals have no or limited capability for empowering real-time data analysis. The Broad Institute (www.broadinstitute.org) offers a suite of online analysis tools for large genome-related datasets. Although these tools can be used against the existing datasets of Broad Institute, application on alternative datasets requires local downloading of the tools of choice. Cloud computing technology has been increasingly used to meet the needs for easy access to analysis tools, as well as storing and retrieval of large datasets. One example is the CloudMap (8), however, it is only applied to a single type of data and integrative analysis is not available. Therefore, a user-friendly solution for accessing, processing, analyzing and integrating genomic, clinical as well as other types of data is currently lacking. Here we present a cloud-based platform, GenePool™ (https://stationx.mygenepool.com), which aims to provide a one-stop solution for genomic data analysis and integration. It offers secure storage, genomics workflows, interactive visualization and interpretation * These individuals contributed equally ¶ Corresponding author 1 of genomics data obtained from various genomic technologies, as well as associated metadata. It currently holds over 10 public datasets across several modalities, including TCGA and GTEx, and multiple biological annotation engines, such as Gene Ontology (GO) and Disease Ontology (DO). GenePool allows import of external genomic data for customized use. It also allows reporting and securely sharing of results with multiple options for annotation, sorting and data export. To demonstrate its function and usability, we applied GenePool to the TCGA datasets of the two subtypes of lung cancer, Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC). For each subtype, we first performed interactive browsing and query, then carried out analysis of differential expression for each subtype. We identified genes that were unique to each subtype as well as a common lung cancer signature. In the second use case, we dove into a deeper analysis of a specific subgroup of LUAD patients focusing on individuals with a history of smoking across multiple modalities including gene expression, somatic mutation and protein expression data. Methods System architecture GenePool runs on Amazon Web Services (http://aws.amazon.com/) and takes advantage of EC2 compute, S3 storage, and other AWS services to maintain a secure computing and storage environment (Figure 1a). Its backend code is primarily implemented in Java, which interacts with a statistical engine powered by R to run select computations. Its front-end primarily uses Javascript and HTML 5.0. It leverages a hybrid database layer to manage and store genomics and annotation data. This hybrid data layer consists of a PostgreSQL database and a flat-file, NoSQL-like component. The supported data types include expression data (RNAseq and miRNAseq) in BAM format or as tab-delimited files, and sequence variation (whole genome and exomes, copy number and DNA mehtylation) mutation calls in VCF format. Users are allowed to import genomics data and/or work with the reference library of GenePool. It has canned workflows, but also allows customized workflows developed with its APIs. The software solution has fully integrated multi-genome browser and provides an easy way to store and select genomics datasets according to sample-associated metadata for analysis, and an automated system for performing routine, well-characterized genomics workflows through an intuitive interface. Figure 1. Overview of GenePool platform. a. System architecture, b. functionality. Analysis tools GenePool has well-characterized workflows for various types of genomic data analysis, such as expression profiling and comparisons, sequence variation profiling and comparisons, time-to-endpoint analysis, paired and trio analysis, gene enrichment analysis. A variety of statistical methods are available as options for the user to choose from, such as t-test or likelihood ratio test for expression comparisons; chi-squared test for variation comparisons; fisher’s exact test for gene enrichment analysis; univariate cox proportional hazard ratio test for time-to-endpoint analysis; and q-values for estimating FDR. The results can be visualized in heat map, co-expression networks, kaplan-meier survival curves, scatter plots, histograms and bar charts (Figure 1b). Scalability The combination of Amazon Cloud Infrastructure and GenePool’s own unique platform architecture grants it elastic scalability, where the accessibility can be scaled up or down depending on demand. Utilizing load-balancing and server / application metrics, GenePool can sense periods of high demand and add additional compute as required to the middle application tier. This additional compute is leveraged to meet the demands of customer workloads particularly during high concurrency and during period where large analyses are being performed. This capability allows GenePool to meet the ever changing and unpredictable demands of our users’ workloads. This elastic scalability also benefits the data import process. As data import jobs can vary in size and computational requirements, each job is assessed for their computational needs and a unique server instantiated for the specific job. 2 This allows GenePool to vertically scale, applying large multi processor servers for large jobs so that data is imported quickly and efficiently. Depending on the type of data being imported, GenePool can
Recommended publications
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • About the Existence of Common Determinants of Gene Expression In
    González-Prendes et al. BMC Genomics (2019) 20:518 https://doi.org/10.1186/s12864-019-5889-5 RESEARCH ARTICLE Open Access About the existence of common determinants of gene expression in the porcine liver and skeletal muscle Rayner González-Prendes1,6†, Emilio Mármol-Sánchez1†, Raquel Quintanilla2, Anna Castelló1,3, Ali Zidi1, Yuliaxis Ramayo-Caldas1, Tainã Figueiredo Cardoso1,4, Arianna Manunza1,ÁngelaCánovas1,5 and Marcel Amills 1,3* Abstract Background: The comparison of expression QTL (eQTL) maps obtained in different tissues is an essential step to understand how gene expression is genetically regulated in a context-dependent manner. In the current work, we have compared the transcriptomic and eQTL profiles of two porcine tissues (skeletal muscle and liver) which typically show highly divergent expression profiles, in 103 Duroc pigs genotyped with the Porcine SNP60 BeadChip (Illumina) and with available microarray-based measurements of hepatic and muscle mRNA levels. Since structural variation could have effects on gene expression, we have also investigated the co-localization of cis-eQTLs with copy number variant regions (CNVR) segregating in this Duroc population. Results: The analysis of differential expresssion revealed the existence of 1204 and 1490 probes that were overexpressed and underexpressed in the gluteus medius muscle when compared to liver, respectively (|fold- change| > 1.5, q-value < 0.05). By performing genome scans in 103 Duroc pigs with available expression and genotypic data, we identified 76 and 28 genome-wide significant cis-eQTLs regulating gene expression in the gluteus medius muscle and liver, respectively. Twelve of these cis-eQTLs were shared by both tissues (i.e.
    [Show full text]
  • Mouse Mtarc2 Conditional Knockout Project (CRISPR/Cas9)
    https://www.alphaknockout.com Mouse Mtarc2 Conditional Knockout Project (CRISPR/Cas9) Objective: To create a Mtarc2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Mtarc2 gene (NCBI Reference Sequence: NM_133684.3 ; Ensembl: ENSMUSG00000073481 ) is located on Mouse chromosome 1. 8 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 7 (Transcript: ENSMUST00000068725). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Mtarc2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-221J17 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Exon 3 starts from about 44.08% of the coding region. The knockout of Exon 3 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 7225 bp, and the size of intron 3 for 3'-loxP site insertion: 1143 bp. The size of effective cKO region: ~669 bp. The cKO region does not have any other known gene. Page 1 of 8 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele gRNA region 5' gRNA region 3' 1 3 4 8 Targeting vector Targeted allele Constitutive KO allele (After Cre recombination) Legends Exon of mouse Mtarc2 Homology arm cKO region loxP site Page 2 of 8 https://www.alphaknockout.com Overview of the Dot Plot Window size: 10 bp Forward Reverse Complement Sequence 12 Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats.
    [Show full text]
  • Protein-Protein Interactions of Human Mitochondrial Amidoxime-Reducing Component in Mammalian Cells John Thomas
    Duquesne University Duquesne Scholarship Collection Electronic Theses and Dissertations Fall 1-1-2016 Protein-Protein Interactions of Human Mitochondrial Amidoxime-Reducing Component in Mammalian Cells John Thomas Follow this and additional works at: https://dsc.duq.edu/etd Recommended Citation Thomas, J. (2016). Protein-Protein Interactions of Human Mitochondrial Amidoxime-Reducing Component in Mammalian Cells (Doctoral dissertation, Duquesne University). Retrieved from https://dsc.duq.edu/etd/61 This Immediate Access is brought to you for free and open access by Duquesne Scholarship Collection. It has been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of Duquesne Scholarship Collection. For more information, please contact [email protected]. PROTEIN-PROTEIN INTERACTIONS OF HUMAN MITOCHONDRIAL AMIDOXIME- REDUCING COMPONENT IN MAMMALIAN CELLS A Dissertation Submitted to the Bayer School of Natural and Environmental Sciences Duquesne University In partial fulfillment of the requirements for the degree of Doctor of Philosophy By John A. Thomas December 2016 Copyright by John A. Thomas October 2016 PROTEIN-PROTEIN INTERACTIONS OF HUMAN MITOCHONDRIAL AMIDOXIME-REDUCING COMPONENT IN MAMMALIAN CELLS By John A. Thomas Approved October 24, 2016 ____________________________________ ____________________________________ Partha Basu, PhD Michael Cascio, PhD Professor of Chemistry and Biochemistry Associate Professor of Chemistry and (Committee Chair) Biochemistry (Committee Member) ____________________________________
    [Show full text]
  • Staphylococcus Epidermidis Rights Reserved; Exclusive Licensee Protects Against Skin Neoplasia American Association for the Advancement 1 1 1 2 2 of Science
    SCIENCE ADVANCES | RESEARCH ARTICLE HEALTH AND MEDICINE Copyright © 2018 The Authors, some A commensal strain of Staphylococcus epidermidis rights reserved; exclusive licensee protects against skin neoplasia American Association for the Advancement 1 1 1 2 2 of Science. No claim to Teruaki Nakatsuji, Tiffany H. Chen, Anna M. Butcher, Lynnie L. Trzoss, Sang-Jip Nam, * original U.S. Government 1 3 3 4 2 1† Karina T. Shirakawa, Wei Zhou, Julia Oh, Michael Otto, William Fenical, Richard L. Gallo Works. Distributed under a Creative We report the discovery that strains of Staphylococcus epidermidis produce 6-N-hydroxyaminopurine (6-HAP), a mol- Commons Attribution ecule that inhibits DNA polymerase activity. In culture, 6-HAP selectively inhibited proliferation of tumor lines but did NonCommercial not inhibit primary keratinocytes. Resistance to 6-HAP was associated with the expression of mitochondrial amidoxime License 4.0 (CC BY-NC). reducing components, enzymes that were not observed in cells sensitive to this compound. Intravenous injection of 6-HAP in mice suppressed the growth of B16F10 melanoma without evidence of systemic toxicity. Colonization of mice with an S. epidermidis strain producing 6-HAP reduced the incidence of ultraviolet-induced tumors compared to mice colonized by a control strain that did not produce 6-HAP. S. epidermidis strains producing 6-HAP were found in the metagenome from multiple healthy human subjects, suggesting that the microbiome of some individuals may confer protection against skin cancer. These findings show a new role for skin commensal bacteria in host defense. INTRODUCTION (17, 18). More recently, several strains of S. epidermidis, S.
    [Show full text]
  • A Missense Variant in Mitochondrial Amidoxime Reducing Component 1 Gene and Protection Against Liver Disease
    bioRxiv preprint doi: https://doi.org/10.1101/594523; this version posted March 31, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 A missense variant in Mitochondrial Amidoxime Reducing Component 1 gene and protection against liver disease Connor A. Emdin, DPhil, Mary Haas, PhD, Amit V. Khera, MD, Krishna Aragam, MD, Mark Chaffin, MS, Lan Jiang, MS, Wei-Qi Wei, MD PhD, Qiping Feng, PhD, Juha Karjalainen, PhD, Aki Havulinna, DSc, Tuomo Kiiskinen, MD, Alexander Bick, MD, Diego Ardissino, MD, James G. Wilson, MD, Heribert Schunkert, MD, Ruth McPherson, MD, Hugh Watkins, MD PhD, Roberto Elosua, MD PhD, Matthew J Bown, FRCS, Nilesh J Samani, FRCP, Usman Baber, MD, Jeanette Erdmann, PhD, Namrata Gupta, PhD, John Danesh FRCP DPhil, Danish Saleheen, MBBS PhD, Mark Daly, PhD, Joshua Denny, MD, Stacey Gabriel, PhD, Sekar Kathiresan, MD Corresponding Author: Sekar Kathiresan, MD Center for Genomic Medicine Massachusetts General Hospital 185 Cambridge Street, CPZN 5.830 Boston, MA 02114 Email: [email protected] Phone: 617 643 6120 Author Affiliations 1) Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114 (Emdin, Khera, Aragam, Haas, Bick, Kathiresan) 2) Department of Medicine, Harvard Medical School, Boston, MA 02114 (Khera, Aragam, Haas, Emdin, Kathiresan) 3) Program in Medical and Population Genetics, Broad Institute,
    [Show full text]
  • And Post-Translationally N-Myristoylated Proteomes in Human Cells
    ARTICLE Received 5 Mar 2014 | Accepted 5 Aug 2014 | Published 26 Sep 2014 DOI: 10.1038/ncomms5919 OPEN Global profiling of co- and post-translationally N-myristoylated proteomes in human cells Emmanuelle Thinon1,2,*,w, Remigiusz A. Serwa1,*, Malgorzata Broncel1, James A. Brannigan3, Ute Brassat1,2,w, Megan H. Wright1,4,w, William P. Heal1,w, Anthony J. Wilkinson3, David J. Mann2,4 & Edward W. Tate1,4 Protein N-myristoylation is a ubiquitous co- and post-translational modification that has been implicated in the development and progression of a range of human diseases. Here, we report the global N-myristoylated proteome in human cells determined using quantitative chemical proteomics combined with potent and specific human N-myristoyltransferase (NMT) inhibition. Global quantification of N-myristoylation during normal growth or apoptosis allowed the identification of 4100 N-myristoylated proteins, 495% of which are identified for the first time at endogenous levels. Furthermore, quantitative dose response for inhibition of N-myristoylation is determined for 470 substrates simultaneously across the proteome. Small-molecule inhibition through a conserved substrate-binding pocket is also demonstrated by solving the crystal structures of inhibitor-bound NMT1 and NMT2. The presented data substantially expand the known repertoire of co- and post-translational N-myristoylation in addition to validating tools for the pharmacological inhibition of NMT in living cells. 1 Department of Chemistry, Imperial College London, Exhibition Road, London SW7 2AZ, UK. 2 Department of Life Sciences, Imperial College London, Exhibition Road, London SW7 2AZ, UK. 3 York Structural Biology Laboratory, Department of Chemistry, University of York, York YO10 5DD, UK.
    [Show full text]
  • Global Proteome of Lonp1+/- Mouse Embryonal Fibroblasts Reveals Impact on Respiratory Chain, but No Interdependence Between Eral1 and Mitoribosomes
    Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 July 2019 doi:10.20944/preprints201907.0144.v1 Peer-reviewed version available at Int. J. Mol. Sci. 2019, 20, 4523; doi:10.3390/ijms20184523 Research Article Global proteome of LonP1+/- mouse embryonal fibroblasts reveals impact on respiratory chain, but no interdependence between Eral1 and mitoribosomes Jana Key1, Aneesha Kohli1, Clea Bárcena2, Carlos López-Otín2, Juliana Heidler3, Ilka Wittig3,*, and Georg Auburger1,* 1 Experimental Neurology, Goethe University Medical School, 60590 Frankfurt am Main; 2 Departamento de Bioquimica y Biologia Molecular, Facultad de Medicina, Instituto Universitario de Oncologia (IUOPA), Universidad de Oviedo, 33006 Oviedo, Spain; 3 Functional Proteomics Group, Goethe-University Hospital, 60590 Frankfurt am Main, Germany * Correspondence: [email protected]; [email protected] Abstract: Research on healthy ageing shows that lifespan reductions are often caused by mitochondrial dysfunction. Thus, it is very interesting that the deletion of mitochondrial matrix peptidase LonP1 was observed to abolish embryogenesis, while deletion of the mitochondrial matrix peptidase ClpP prolonged survival. To unveil the targets of each enzyme, we documented the global proteome of LonP1+/- mouse embryonal fibroblasts (MEF), for comparison with ClpP-/- depletion. Proteomic profiles of LonP1+/- MEF generated by label-free mass spectrometry were further processed with the STRING webserver Heidelberg for protein interactions. ClpP was previously reported to degrade Eral1 as a chaperone involved in mitoribosome assembly, so ClpP deficiency triggers accumulation of mitoribosomal subunits and inefficient translation. LonP1+/- MEF also showed Eral1 accumulation, but no systematic effect on mitoribosomal subunits. In contrast to ClpP-/- profiles, several components of the respiratory complex I membrane arm were accumulated, whereas the upregulation of numerous innate immune defense components was similar.
    [Show full text]
  • Epigenetic Trajectories to Childhood Asthma
    Epigenetic Trajectories to Childhood Asthma Item Type text; Electronic Dissertation Authors DeVries, Avery Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 04/10/2021 09:24:24 Link to Item http://hdl.handle.net/10150/630233 1 EPIGENETIC TRAJECTORIES TO CHILDHOOD ASTHMA by Avery DeVries __________________________ Copyright © Avery DeVries 2018 A Dissertation Submitted to the Faculty of the DEPARTMENT OF CELLULAR AND MOLECULAR MEDICINE In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2018 2 3 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of the requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that an accurate acknowledgement of the source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author. SIGNED: Avery DeVries 4 ACKNOWLEDGEMENTS I would like to thank my dissertation mentor, Donata Vercelli, for endless support throughout this journey and for always encouraging and challenging me to be a better scientist.
    [Show full text]
  • Analysis of Mammalian Gene Function Through Broad-Based Phenotypic
    ARTICLES Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics The function of the majority of genes in the mouse and human genomes remains unknown. The mouse embryonic stem cell knockout resource provides a basis for the characterization of relationships between genes and phenotypes. The EUMODIC consortium developed and validated robust methodologies for the broad-based phenotyping of knockouts through a pipeline comprising 20 disease-oriented platforms. We developed new statistical methods for pipeline design and data analysis aimed at detecting reproducible phenotypes with high power. We acquired phenotype data from 449 mutant alleles, representing 320 unique genes, of which half had no previous functional annotation. We captured data from over 27,000 mice, finding that 83% of the mutant lines are phenodeviant, with 65% demonstrating pleiotropy. Surprisingly, we found significant differences in phenotype annotation according to zygosity. New phenotypes were uncovered for many genes with previously unknown function, providing a powerful basis for hypothesis generation and further investigation in diverse systems. Phenotypic annotations of knockout mutants have been generated for (EMPReSS) database10 catalogs the standard operating procedures about a third of the genes in the mouse genome1. However, the way in (SOPs) that were developed, including operational details and the which the phenotype is screened is often dependent on the expertise and parameters measured. More recently, a major single-center effort interests of the investigator, and in only a few cases has a broad-based to analyze several hundred knockout lines through a phenotyping assessment of phenotype been undertaken that encompassed the analy- pipeline has illuminated the pleiotropy that can be found and the sis of developmental, biochemical, physiological and organ systems2–4.
    [Show full text]
  • Raav-Based Gene Therapy for Molybdenum Cofactor Deficiency Type B" Has Been Written Independently with No Other Sources and Aids Than Quoted
    rAAV-based gene therapy for molybdenum cofactor deficiency type B Doctoral thesis In partial fulfilment of the requirements for the degree "Doctor rerum naturalium (Dr. rer. nat.)" in the Molecular Medicine Study Program at the Georg-August University Göttingen submitted by Joanna Jakubiczka-Smorag born in Czestochowa, Poland Göttingen, 2015 Members of the Thesis Committee: Supervisor Prof. Dr. rer. nat. Peter Burfeind Institute of Human Genetics, University Medical Centre Göttingen Second member of the Thesis Committee Prof. Dr. rer. nat. Peter Schu Department of Cellular Biochemistry, University Medical Centre Göttingen Third member of the Thesis Committee Prof. Dr. med. Wolfgang Brück Institute of Neuropathology, University Medical Centre Göttingen Date of Disputation: 03.06.2015 AFFIDAVIT Herewith I declare that my doctoral thesis entitled: "rAAV-based gene therapy for molybdenum cofactor deficiency type B" has been written independently with no other sources and aids than quoted. Göttingen, 24.04.2015 ………………………………….. TABLE OF CONTENTS TABLE OF CONTENTS TABLE OF CONTENTS ........................................................................................................................... I LIST OF ABBREVIATIONS .................................................................................................................... VI 1 INTRODUCTION .......................................................................................................................... 1 1.1 Molybdenum cofactor (MoCo) deficiency ...........................................................................
    [Show full text]
  • Brno University of Technology Vysoké Učení Technické V Brně
    BRNO UNIVERSITY OF TECHNOLOGY VYSOKÉ UČENÍ TECHNICKÉ V BRNĚ FACULTY OF ELECTRICAL ENGINEERING AND COMMUNICATION FAKULTA ELEKTROTECHNIKY A KOMUNIKAČNÍCH TECHNOLOGIÍ DEPARTMENT OF BIOMEDICAL ENGINEERING ÚSTAV BIOMEDICÍNSKÉHO INŽENÝRSTVÍ TRANSCRIPTOMIC CHARACTERIZATION USING RNA-SEQ DATA ANALYSIS TRANSKRIPTOMICKÁ CHARAKTERIZACE POMOCÍ ANALÝZY RNA-SEQ DAT DOCTORAL THESIS DIZERTAČNÍ PRÁCE AUTHOR Layal Abo Khayal AUTOR PRÁCE SUPERVISOR prof. Ing. Ivo Provazník, Ph.D. ŠKOLITEL BRNO 2017 ABSTRACT The high-throughputs sequence technologies produce a massive amount of data, that can reveal new genes, identify splice variants, and quantify gene expression genome-wide. However, the volume and the complexity of data from RNA-seq experiments necessitate a scalable, and mathematical analysis based on a robust statistical model. Therefore, it is challenging to design integrated workflow, that incorporates the various analysis procedures. Particularly, the comparative transcriptome analysis is complicated due to several sources of measurement variability and poses numerous statistical challenges. In this research, we performed an integrated transcriptional profiling pipeline, which generates novel reproducible codes to obtain biologically interpretable results. Starting with the annotation of RNA-seq data and quality assessment, we provided a set of codes to serve the quality assessment visualization needed for establishing the RNA-Seq data analysis experiment. Additionally, we performed comprehensive differential gene expression analysis, presenting
    [Show full text]