Divine: Prioritizing Genes for Rare Mendelian Disease in Whole Exome Sequencing Data Changjin Hong, Jean R

Total Page:16

File Type:pdf, Size:1020Kb

Divine: Prioritizing Genes for Rare Mendelian Disease in Whole Exome Sequencing Data Changjin Hong, Jean R [Supplementary Documents] Divine: Prioritizing Genes for Rare Mendelian Disease in Whole Exome Sequencing Data Changjin Hong, Jean R. Clemenceau, Yunku Yeu, and TaeHyun Hwang* Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue Cleveland, OH 44195 Contents 1 Workflow ............................................................................................................................................... 2 2 Annotation ............................................................................................................................................ 3 3 Requirement ......................................................................................................................................... 3 3.1 Acceptable HPO inputs ................................................................................................................. 3 3.2 Input and output ........................................................................................................................... 3 3.3 Installation and manual ................................................................................................................ 4 4 Methods: ............................................................................................................................................... 4 4.1 Comparing known disease phenotypes with patient phenotypes ............................................... 4 4.2 Damage prediction from genetic information .............................................................................. 4 4.2.1 Pathogenic likelihood from AA ................................................................................................. 5 4.2.2 Functional impact by variant location....................................................................................... 5 4.2.3 Pathogenic variant density in a protein domain ....................................................................... 5 4.2.4 Pathogenic score for a mutated gene ....................................................................................... 6 4.3 Phenotype gene enrichment ........................................................................................................ 7 4.3.1 Gene ontology enrichment ....................................................................................................... 7 4.4 KEGG pathway enrichment ........................................................................................................... 8 4.5 Combining Gi and Pi and a final ranking by a heat diffusion on STRING network. ....................... 8 5 Experiments .......................................................................................................................................... 8 5.1 The other methods for comparison .............................................................................................. 8 5.2 26 WES retrospective study samples ............................................................................................ 8 5.3 AUC scores .................................................................................................................................... 9 5.4 A Case Study with Atypical Hemolytic Uremic Syndrome (aHUS) [14] ....................................... 10 5.5 Divine under noise HPO queries ................................................................................................. 11 6 Reference ............................................................................................................................................ 17 1 Workflow S. Figure 1. Divine Workflow: Divine takes either VCF file or patient phenotype in HPO IDs. Divine annotates each variant with up to 30 databases and features in either variant-level or gene-level. Divine also supports a discovery mode to infer genes that have never been associated with a certain disease model. Not only proband sample but also trio familial samples can be analyzed. Divine assesses the pathogenicity of each gene by analyzing both the patient phenotype information and genetic variants and provides a prioritized gene ranking list in Microsoft Excel format as an annotation table. 2 Annotation S. Table 1. Divine uses Varant [11] as an annotation framework. Originally, 22 features were available. In the release of Divine, eight new annotations are added or supported (the eight items at the bottom of the table). 1. dbSNP, 1000Genome Minor Allele Frequency (MAF) & ESP (MAF) 2. Clinically significant variants from ClinVar DB 3. GWAS Phenotype 4. Genomic region - Intergenic, Intronic, Exonic & UTR 5. Downstream and upstream gene for intergenic variants 6. Splice Site (Donor/Acceptor) 7. Mutation Type - NonSyn, Syn, StartGain, StartLoss, StopGain, StopLoss, SynStop 8. Codon Usage in Human 9. Exonic splice enhancer / silencer site 10. Flag variants that spans boundary region like Intron-Exon or UTR-CDS 11. Distance of intronic variants from splice sites 12. UTR Functional Motifs 13. miRNA Binding Site 14. Polyphen2, SIFT & CADD prediction 15. Gene-Disease association - OMIM, NCBI-GAD 16. Position Conservation - Gerp++ Score 17. Interpro Domain 18. TFBS 19. eQTL 20. Low complexity region 21. Pseudo autosomal region 22. Capture region 23. COSMIC 24. HGMD (a license is required) 25. Gene Ontology and KEGG pathway 26. ExAC 27. ClinVitae 28. Protein domain pathogenicity 29. Amino acid change pathogenicity 30. Genetic model (autosomal recessive/recessive, homozygous, heterozygous, compound heterozygous) 3 Requirement Divine requires either a standard format VCF file or a text file of Human Phenotype Ontology (HPO) IDs that describe patients’ clinical features. 3.1 Acceptable HPO inputs It is very helpful to provide phenotype-to-disease associations from HPO [2] that allows for large-scale computational analysis of the human phenome. Currently, Divine only accepts an HPO ID (e.g., HP: 0002803) rather than terms or vocabularies (e.g., “Congenital contracture”) describing a patient clinical feature. A couple of websites are available to convert a phenotypic description into an appropriate HPO ID from https://mseqdr.org/search_phenotype.php, http://compbio.charite.de/phenomizer, or https://hpo.jax.org/. 3.2 Input and output When only HPO IDs are given, Divine generates an inferred disease list with associated genes. If VCF file (with HPO IDs) is given, it generates an annotated variant table and an annotated inferred disease ranking list in Microsoft® Excel format. For the best result, it is ideal to provide both VCF (e.g., generated by GATK germline variant caller [16]) and a set of HPO IDs. Divine mainly uses an existing annotation framework, Varant [11], originally providing 22 annotations and we add eight new features in Divine (See the last eight items in S. Table 1). 3.3 Installation and manual https://github.com/cjhong/divine 4 Methods: 4.1 Comparing known disease phenotypes with patient phenotypes Given a patient query HPO set, H={1,2,…,m,…,M}, we calculate a semantic similarity with each known disease phenotype (j) HPO set, Dj={1,2,…,n,…, N}. Total M by N term-to-term similarity (푠,) is available. We use simRel [10] semantic measure defined in [s.eq2]. In the equation, pm indicates an information content of m and CLA stands for a common lowest ancestor in the ontology graph. In order to summarize the M by N similarity matrix into a single value, 푠퐻, 퐷, we use a method suggested in [20], but we adapt it to our application. Between the two maximum average values, one in each column and the other from each row respectively, the maximum average value is taken. Symptoms or phenotypic descriptions related to disease are incomplete and sparse. The number of phenotypes describing a disease significantly vary among diseases. Thus, we penalize the maximum average value by dividing it by |M-N| in a log scale, ∑ 푚푎푥 {푠 } ∑ 푚푎푥 {푠 } 푚푎푥 , , , 푀 푁 푠퐻, 퐷 = [푠. 푒푞1] 푙표푔(|푀 − 푁| + 10) 2푙표푔(푝) 푠, = (1− 푝) [푠. 푒푞2] 푙표푔(푝) + 푙표푔(푝) One gene can be associated with more than two diseases, which is often true when two diseases are very similar to each other. We retain only max s(H, Dj) among those and assign the phenotypic score to the gene i directly associated with Dj, [ ] 푃 = 푚푎푥∈∀{}푠퐻, 퐷 푠. 푒푞3 . 4.2 Damage prediction from genetic information Divine uses hg19 (e.g., GRCh37) as a reference genome sequence. By default, Divine filters out any variant outside of exonic regions or UTR with +/- 20 bp flanking. Note that the user can change this option to handle either whole genome or targeted sequencing reads. As a gene model, we use NCBI RefSeq gene annotation, containing 52,065 isoform transcripts across 26,668 genes. As described in the main manuscript, Divine filters out any variant frequently observed in a common population where the user can define the MAF (Minor Allele Frequency) cutoff value. Divine predicts the pathogenicity of a gene from variants in a VCF file in the following 3 components: 1) a pathogenic likelihood by amino acid change predicted from known pathogenic databases, 2) an impact score by the variant location within a transcript, and 3) pathogenic density per active protein domain. 4.2.1 Pathogenic likelihood from AA Taking positive controls from pathogenic variants that appeared in ClinVar[15] or HMGD professional [9], we train a beta distribution (i.e., a cumulative distribution function, FP[a]) of either Gerp++ [18] or CADD [19] scores by each amino acid change (a). Similarly, the other beta distribution (i.e., a cumulative distribution
Recommended publications
  • Exploring the Relationship Between Gut Microbiota and Major Depressive Disorders
    E3S Web of Conferences 271, 03055 (2021) https://doi.org/10.1051/e3sconf/202127103055 ICEPE 2021 Exploring the Relationship between Gut Microbiota and Major Depressive Disorders Catherine Tian1 1Shanghai American School, Shanghai, China Abstract. Major Depressive Disorder (MDD) is a psychiatric disorder accompanied with a high rate of suicide, morbidity and mortality. With the symptom of an increasing or decreasing appetite, there is a possibility that MDD may have certain connections with gut microbiota, the colonies of microbes which reside in the human digestive system. In recent years, more and more studies started to demonstrate the links between MDD and gut microbiota from animal disease models and human metabolism studies. However, this relationship is still largely understudied, but it is very innovative since functional dissection of this relationship would furnish a new train of thought for more effective treatment of MDD. In this study, by using multiple genetic analytic tools including Allen Brain Atlas, genetic function analytical tools, and MicrobiomeAnalyst, I explored the genes that shows both expression in the brain and the digestive system to affirm that there is a connection between gut microbiota and the MDD. My approach finally identified 7 MDD genes likely to be associated with gut microbiota, implicating 3 molecular pathways: (1) Wnt Signaling, (2) citric acid cycle in the aerobic respiration, and (3) extracellular exosome signaling. These findings may shed light on new directions to understand the mechanism of MDD, potentially facilitating the development of probiotics for better psychiatric disorder treatment. 1 Introduction 1.1 Major Depressive Disorder Major Depressive Disorder (MDD) is a mood disorder that will affect the mood, behavior and other physical parts.
    [Show full text]
  • A Peripheral Blood Gene Expression Signature to Diagnose Subclinical Acute Rejection
    CLINICAL RESEARCH www.jasn.org A Peripheral Blood Gene Expression Signature to Diagnose Subclinical Acute Rejection Weijia Zhang,1 Zhengzi Yi,1 Karen L. Keung,2 Huimin Shang,3 Chengguo Wei,1 Paolo Cravedi,1 Zeguo Sun,1 Caixia Xi,1 Christopher Woytovich,1 Samira Farouk,1 Weiqing Huang,1 Khadija Banu,1 Lorenzo Gallon,4 Ciara N. Magee,5 Nader Najafian,5 Milagros Samaniego,6 Arjang Djamali ,7 Stephen I. Alexander,2 Ivy A. Rosales,8 Rex Neal Smith,8 Jenny Xiang,3 Evelyne Lerut,9 Dirk Kuypers,10,11 Maarten Naesens ,10,11 Philip J. O’Connell,2 Robert Colvin,8 Madhav C. Menon,1 and Barbara Murphy1 Due to the number of contributing authors, the affiliations are listed at the end of this article. ABSTRACT Background In kidney transplant recipients, surveillance biopsies can reveal, despite stable graft function, histologic features of acute rejection and borderline changes that are associated with undesirable graft outcomes. Noninvasive biomarkers of subclinical acute rejection are needed to avoid the risks and costs associated with repeated biopsies. Methods We examined subclinical histologic and functional changes in kidney transplant recipients from the prospective Genomics of Chronic Allograft Rejection (GoCAR) study who underwent surveillance biopsies over 2 years, identifying those with subclinical or borderline acute cellular rejection (ACR) at 3 months (ACR-3) post-transplant. We performed RNA sequencing on whole blood collected from 88 indi- viduals at the time of 3-month surveillance biopsy to identify transcripts associated with ACR-3, developed a novel sequencing-based targeted expression assay, and validated this gene signature in an independent cohort.
    [Show full text]
  • Clinicogenetic and Functional Studies in Rare Hereditary
    Clinicogenetic and functional studies in rare hereditary neurodegenerative movement disorders by Dr. Sarah Wiethoff A thesis submitted to University College London for the degree of Doctor of Philosophy Department of Molecular Neuroscience Institute of Neurology University College London (UCL) May 2016 1 I, Sarah Wiethoff, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Collaborative work is also indicated in this thesis. Signature: Date: 2 Abstract Neurodegenerative diseases are equally fascinating as they are devastating. They illustrate both function and pathology of neurons, the most complex cells in the human body. In the past, technological progress has allowed the identification of genetic variation that can lead to neurodegenerative processes. However, for many patients with different neurodegenerative diseases to date, no genetic diagnosis is obtained despite thorough investigation. For another significant proportion of neurodegenerative diseases the genetic defect and the resulting clinical phenotype/spectrum is known, but exact pathomechanisms remain elusive. This delays successful translational research and eventual clinical treatment. The objective of this thesis is to combine both aspects and employ two main techniques to further advance the search for better pathophysiologic understanding of neurodegenerative diseases: whole exome sequencing (WES) and induced pluripotent stem cell (iPSC) technology. Firstly, the thesis aims to improve clinical characterisation and genetic analysis of neurodegenerative patients to identify genetic causes and genetic modifiers of disease. Secondly, it aims to establish functional models in search of pathogenic and potentially druggable mechanisms using iPSCs in clinically and genetically characterised groups of patients.
    [Show full text]
  • The Hunt for the PCCA Causing Mutation - a Genetic Thriller Searching for a Progressive Cerebellocerebral Atrophy (PCCA) Causing Mutation in Jewish Moroccan Families
    The Hunt for the PCCA Causing Mutation - A Genetic Thriller Searching for a progressive cerebellocerebral atrophy (PCCA) causing mutation in Jewish Moroccan families Nir Adam Sharon Degree project in biology, Master of science (2 years), 2010 Examensarbete i biologi 30 hp till masterexamen, 2010 Biology Education Centre, Uppsala University, and The Morris Kahn Laboratory of Human Genetics, National Institute for Biotechnology in the Negev, Ben Gurion University, Beer-Sheva 84105, Israel Supervisor: Prof. Ohad Birk Table of Contents Title Page.................................................................................1 Table of Contents.....................................................................2 Summary..................................................................................3 Introduction............................................................................. 4 Progressive cerebellocerebral atrophy (PCCA) ....................................................4 Homozygosity mapping ........................................................................................5 Aims...................................................................................................................... 6 Assumptions.......................................................................................................... 7 Results..................................................................................... 8 Analysis of SNP-array data..................................................................................
    [Show full text]
  • The Nosology of Hereditary Cerebellar Ataxias: Development of a Classification for Recessive Ataxias and Phenotypical Description of Spinocerebellar Ataxia 34
    The nosology of hereditary cerebellar ataxias: Development of a classification for recessive ataxias and phenotypical description of Spinocerebellar ataxia 34 Mémoire Marie Beaudin Maîtrise en épidémiologie - épidémiologie clinique - avec mémoire Maître ès sciences (M. Sc.) Québec, Canada © Marie Beaudin, 2019 The nosology of hereditary cerebellar ataxias Development of a classification for recessive ataxias and phenotypical description of Spinocerebellar ataxia 34 Mémoire Marie Beaudin Sous la direction de : Nicolas Dupré, directeur de recherche Danielle Laurin, codirectrice de recherche © Marie Beaudin 2019 Résumé Les ataxies cérébelleuses héréditaires causent une atteinte progressive de l’équilibre et de la marche. Malgré l’amélioration de la performance et de l’accessibilité des tests génétiques, environ la moitié des patients demeurent sans diagnostic précis, ce qui a un impact sur la prise en charge. Dans ce mémoire de maîtrise, nous abordons l’enjeu du sous-diagnostic chez les patients atteints d’ataxie cérébelleuse via l’élaboration d’une nouvelle classification pour les ataxies récessives et la caractérisation détaillée de l’ataxie spinocérébelleuse 34. Le premier chapitre est une revue systématique de la littérature concernant les ataxies récessives. Au total, 2354 références et 130 articles complets ont été révisés afin d’identifier un groupe de 45 pathologies récessives où l’atteinte cérébelleuse est au cœur du phénotype et 29 pathologies multisystémiques additionnelles où l’ataxie est un élément secondaire, mais qui devraient être incluses dans le diagnostic différentiel du patient ataxique. Le deuxième chapitre présente les résultats d’un groupe de travail dédié à la classification des ataxies récessives. En se basant sur les résultats de la revue systématique, 12 experts internationaux se sont entendus sur des critères d’inclusion ainsi que sur deux classifications basées sur la symptomatologie clinique et les mécanismes cellulaires impliqués.
    [Show full text]
  • Biomedical Informatics
    BIOMEDICAL INFORMATICS Abstract GENE LIST AUTOMATICALLY DERIVED FOR YOU (GLAD4U): DERIVING AND PRIORITIZING GENE LISTS FROM PUBMED LITERATURE JEROME JOURQUIN Thesis under the direction of Professor Bing Zhang Answering questions such as ―Which genes are related to breast cancer?‖ usually requires retrieving relevant publications through the PubMed search engine, reading these publications, and manually creating gene lists. This process is both time-consuming and prone to errors. We report GLAD4U (Gene List Automatically Derived For You), a novel, free web-based gene retrieval and prioritization tool. The quality of gene lists created by GLAD4U for three Gene Ontology terms and three disease terms was assessed using ―gold standard‖ lists curated in public databases. We also compared the performance of GLAD4U with that of another gene prioritization software, EBIMed. GLAD4U has a high overall recall. Although precision is generally low, its prioritization methods successfully rank truly relevant genes at the top of generated lists to facilitate efficient browsing. GLAD4U is simple to use, and its interface can be found at: http://bioinfo.vanderbilt.edu/glad4u. Approved ___________________________________________ Date _____________ GENE LIST AUTOMATICALLY DERIVED FOR YOU (GLAD4U): DERIVING AND PRIORITIZING GENE LISTS FROM PUBMED LITERATURE By Jérôme Jourquin Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in Biomedical Informatics May, 2010 Nashville, Tennessee Approved: Professor Bing Zhang Professor Hua Xu Professor Daniel R. Masys ACKNOWLEDGEMENTS I would like to express profound gratitude to my advisor, Dr. Bing Zhang, for his invaluable support, supervision and suggestions throughout this research work.
    [Show full text]
  • Modeling Gene Regulation from Paired Expression and Chromatin Accessibility Data
    Modeling gene regulation from paired expression and PNAS PLUS chromatin accessibility data Zhana Durena,b,c, Xi Chenb, Rui Jiangd,1, Yong Wanga,c,1, and Wing Hung Wongb,1 aAcademy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100080, China; bDepartment of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, CA 94305; cSchool of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; and dMinistry of Education Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, Tsinghua National Laboratory for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China Contributed by Wing Hung Wong, May 8, 2017 (sent for review March 20, 2017; reviewed by Christina Kendziorski and Sheng Zhong) The rapid increase of genome-wide datasets on gene expression, gene expression data, accessibility data are available for a diverse set chromatin states, and transcription factor (TF) binding locations offers of cellular contexts (Fig. 1, blue boxes). In fact, we expect the an exciting opportunity to interpret the information encoded in amount of matched expression and accessibility data (i.e., measured genomes and epigenomes. This task can be challenging as it requires on the same sample) will increase very rapidly in the near future. joint modeling of context-specific activation of cis-regulatory ele- The purpose of the present work is to show that, by using ments (REs) and the effects on transcription of associated regulatory matched expression and accessibility data across diverse cellular factors. To meet this challenge, we propose a statistical approach contexts, it is possible to recover a significant portion of the in- based on paired expression and chromatin accessibility (PECA) data formation in the missing data on binding location and chromatin across diverse cellular contexts.
    [Show full text]
  • Using Intrinsic and Extrinsic Methods to Engineer Improved Expression of Recombinant Proteins and Retroviral Vectors in Mammalian Cells
    USING INTRINSIC AND EXTRINSIC METHODS TO ENGINEER IMPROVED EXPRESSION OF RECOMBINANT PROTEINS AND RETROVIRAL VECTORS IN MAMMALIAN CELLS By Sarah Inwood A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland July 2018 © 2018 Sarah Inwood All Rights Reserved Abstract Recombinant proteins, produced by introducing DNA into producer cells, are important in biotechnology, pharmaceuticals and academia. While prokaryotic cells are still most commonly used in these fields, mammalian cells are becoming more prevalent, especially for human proteins such antibodies, due to their inherent ability to correctly fold proteins, and retroviral vectors, due to their viral pseudotyping. This dissertation focuses on engineering improvement of recombinant protein expression and retroviral vector titer using both intrinsic methods such as cell engineering and extrinsic methods such as process development. To this end, multiple strategies such as non-coding RNA, stable transfections, CRISPR/Cas9 knockout, high-throughput screenings, and bioreactor perfusion processes were employed. Retroviral vectors have been of interest for some time due to their ability to modify genomes with relative ease and safety. This is increasingly so with the advancement of adoptive T-cell therapy, which is the transfer of T-Cells into a patient. These T-cells, often autologous, are typically modified, via various methods including retroviral vectors. Using mir-22-3p, which improves recombinant protein production, the first strategy was to identify gene targets of this microRNA that also improve recombinant protein expression. A microarray analysis was followed by bioinformatics; combining the results of the microarray with the predicted microRNA targets and the results of a high-throughput siRNA screen.
    [Show full text]
  • Rare and De Novo Variants in 827 Congenital Diaphragmatic Hernia Probands Implicate
    medRxiv preprint doi: https://doi.org/10.1101/2021.06.01.21257928; this version posted June 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license . 1 Rare and de novo variants in 827 congenital diaphragmatic hernia probands implicate 2 LONP1 and ALYREF as new candidate risk genes 3 Lu Qiao,1,2 Le Xu,3 Lan Yu,1 Julia Wynn,1 Rebecca Hernan,1 Xueya Zhou,1,2 Christiana 4 Farkouh-Karoleski,1 Usha S. Krishnan,1 Julie Khlevner,1 Aliva De,1 Annette Zygmunt,1 5 Timothy Crombleholme, 4 Foong-Yen Lim,5 Howard Needelman,6 Robert A. Cusick,6 6 George B. Mychaliska,7 Brad W. Warner,8 Amy J. Wagner,9 Melissa E. Danko,10 Dai 7 Chung,10 Douglas Potoka,11 Przemyslaw Kosiński,12 David J. McCulley,13 Mahmoud 8 Elfiky,14 Kenneth Azarow,15 Elizabeth Fialkowski,15 David Schindel,16 Samuel Z. Soffer,17 9 Jane B. Lyon,18 Jill M. Zalieckas,19 Badri N. Vardarajan,20 Gudrun Aspelund,1 Vincent P. 10 Duron,1 Frances A. High,19,21,22 Xin Sun,3 Patricia K. Donahoe,21,23 Yufeng Shen,2,24,25,* and 11 Wendy K. Chung1,26,* 12 1Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 13 10032, USA; 2Department of Systems Biology, Columbia University Irving Medical Center, 14 New York, NY 10032, USA; 3Department of Pediatrics, University of California, San Diego 15 Medical School, San Diego, CA 92092, USA; 4Medical City Children’s Hospital, Dallas, TX 16 75230, USA; 5Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; 17 6University of Nebraska Medical Center College of Medicine, Omaha, NE 68114, USA; 18 7University of Michigan Health System, Ann Arbor, MI 48109, USA; 8Washington 19 University School of Medicine, St.
    [Show full text]
  • Activity of Key Enzymes Involved in Glucose and Triglyceride Catabolism
    REPRODUCTIONRESEARCH Mammalian meiosis is more conserved by sex than by species: conserved co-expression networks of meiotic prophase Yongchun Su1, Yunfei Li1,2 and Ping Ye1,3 1School of Molecular Biosciences, 2Department of Statistics and 3Center for Reproductive Biology, Washington State University, PO Box 647520, Pullman, Washington 99164, USA Correspondence should be addressed to P Ye at School of Molecular Biosciences, Washington State University; Email: [email protected] Y Su and Y Li contributed equally to this work Abstract Despite the importance of meiosis to human reproduction, we know remarkably little about the genes and pathways that regulate meiotic progression through prophase in any mammalian species. Microarray expression profiles of mammalian gonads provide a valuable resource for probing gene networks. However, expression studies are confounded by mixed germ cell and somatic cell populations in the gonad and asynchronous germ cell populations. Further, widely used clustering methods for analyzing microarray profiles are unable to prioritize candidate genes for testing. To derive a comprehensive understanding of gene expression in mammalian meiotic prophase, we constructed conserved co-expression networks by linking expression profiles of male and female gonads across mouse and human. We demonstrate that conserved gene co-expression dramatically improved the accuracy of detecting known meiotic genes compared with using co-expression in individual studies. Interestingly, our results indicate that meiotic prophase is more conserved by sex than by species. The co-expression networks allowed us to identify genes involved in meiotic recombination, chromatin cohesion, and piRNA metabolism. Further, we were able to prioritize candidate genes based on quantitative co-expression links with known meiotic genes.
    [Show full text]