Genomic Landscape of Human Allele-Specific DNA Methylation

Total Page:16

File Type:pdf, Size:1020Kb

Genomic Landscape of Human Allele-Specific DNA Methylation Genomic landscape of human allele-specific DNA methylation Fang Fanga, Emily Hodgesb, Antoine Molarob, Matthew Deana, Gregory J. Hannonb, and Andrew D. Smitha,1 aMolecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California; and bHoward Hughes Medical Institute, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 Edited by Wing Hung Wong, Stanford University, Stanford, CA, and approved March 8, 2012 (received for review January 24, 2012) DNA methylation mediates imprinted gene expression by passing stages preceding the context in which they become active. Such an epigenomic state across generations and differentially marking methods have been successfully applied to identify unique im- specific regulatory regions on maternal and paternal alleles. Im- printed genes (14–17). printing has been tied to the evolution of the placenta in mammals Advances in DNA sequencing technology have been leveraged and defects of imprinting have been associated with human dis- for high-throughput identification of imprinted genes. The “BS- eases. Although recent advances in genome sequencing have revo- seq” technology couples bisulfite treatment with high-throughput lutionized the study of DNA methylation, existing methylome data short-read sequencing, and has enabled genome-wide profiling of remain largely untapped in the study of imprinting. We present a DNA methylation in mammalian genomes at single-CpG (cyto- statistical model to describe allele-specific methylation (ASM) in sine guanine dinucleotide) resolution (18). Li et al. (19) produced data from high-throughput short-read bisulfite sequencing. Simula- a methylome from peripheral blood of a single individual and tion results indicate technical specifications of existing methylome recognized the potential of using such data to profile ASM. They data, such as read length and coverage, are sufficient for full- employed a method based on associating heterozygous SNPs genome ASM profiling based on our model. We used our model to with differential methylation, and identified hundreds of ASM analyze methylomes for a diverse set of human cell types, including regions. Methods such as this, however, must be applied to data cultured and uncultured differentiated cells, embryonic stem cells from a single individual and for which matching genotypic data and induced pluripotent stem cells. Regions of ASM identified most are available. There are two shortcomings of approaches that BIOPHYSICS AND consistently across methylomes are tightly connected with known depend on genotype. First, they can be confounded by ASM that imprinted genes and precisely delineate the boundaries of several is associated with genotype, but which may not have any regula- COMPUTATIONAL BIOLOGY known imprinting control regions. Predicted regions of ASM com- tory effect. The amount of ASM typically associated with geno- mon to multiple cell types frequently mark noncoding RNA promo- type is not well understood, but recent reports suggest it is ters and represent promising starting points for targeted validation. significant (20). More importantly, because imprinted methyla- More generally, our model provides the analytical complement to tion is not necessarily associated with genotypic variation, these cutting-edge experimental technologies for surveying ASM in spe- methods will be inherently blind to some portion of ASM. cific cell types and across species. We present a probabilistic model to describe ASM based on data from BS-seq experiments. Our model is independent of enomic imprinting refers to genes that are preferentially ex- genotype, and therefore has broad applicability to identify ASM Gpressed from either the maternal or paternal allele without in the context of imprinting. In essence, our model describes the genotype dependence (1). In mammals, such parent-of-origin gene degree to which methylation states in reads appear to reflect expression is believed to have evolved along with the placenta, ser- two distinct patterns, each pattern representing roughly half the ving to mediate resource distribution between a mother and her data. We validated our method using semisimulated data in which offspring (2, 3), though other theories have been proposed (4–6). methylation states were simulated within actual reads from BS- The connection between imprinting and DNA methylation was seq experiments. Our results indicate that technical characteris- uncovered shortly after the first identification of imprinted genes tics of existing public methylomes (i.e., read length and coverage) in mammals (7). Imprinted gene expression, in all known cases, is are sufficient to accurately identify AMRs. By applying our model regulated by allele-specific methylation (ASM) of some cis-acting to 22 human methylomes, emphasizing those from uncultured regulatory regions. We use the term allelically methylated region cells, we identified a set of candidate AMRs involved in im- (AMR) in reference to any genomic interval of ASM, whether or printed gene regulation. Candidates consistently identified across not it is associated with imprinted regulation. Typically, an entire methylomes display remarkable concordance with known im- imprinted locus is organized as a cluster and regulated by an printed genes and allow boundaries of known AMRs to be pre- imprinting control region (ICR) and several other AMRs. The cisely defined. Many candidates not associated with known allelic methylation patterns of ICRs are set during gametogenesis imprinted genes mark the promoters of long noncoding RNAs and stably maintained throughout somatic development in the (lncRNAs) and are also supported by similar analyses at ortho- offspring (8), irrespective of gene expression levels. The remain- logous regions in chimp; these provide a starting point for iden- ing AMRs may be established after fertilization (9), possibly un- tifying additional imprinted genes, ICRs, and possibly imprinted der the control of nearby ICRs or other epigenetic signals. clusters. Our model, therefore, is an essential analytical comple- The identification of imprinted genes and a detailed under- ment to recently emerged experimental methods for understand- standing of their regulation has become increasingly important, ing the role of DNA methylation in genomic imprinting. along with the realization that aberrant genomic imprinting contributes to several complex diseases (10). Much effort has Author contributions: F.F., E.H., A.M., M.D., G.J.H., and A.D.S. designed research; F.F., E.H., been directed toward locating imprinted genes using expression A.M., and A.D.S. performed research; F.F. and A.D.S. analyzed data; and F.F. and A.D.S. screen-based approaches (11, 12). One limitation of such ap- wrote the paper. proaches is that many imprinted genes may only show allele- The authors declare no conflict of interest. specific expressions in particular tissues at appropriate develop- This article is a PNAS Direct Submission. mental stages (13). ASM screen-based approaches might over- 1To whom correspondence should be addressed. E-mail: [email protected]. come the effect of temporal and spatial expression patterns This article contains supporting information online at www.pnas.org/lookup/suppl/ because the ICRs are expected to exist through developmental doi:10.1073/pnas.1201310109/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1201310109 PNAS Early Edition ∣ 1of6 Downloaded by guest on September 27, 2021 Yn Y2 Modeling Allele-Specific Methylation in BS-Seq Data R mðγj;iÞ L2 Θ R; γ j j 0 5jRj θ 1 − θ uðγj;iÞ; [3] We begin this section with a verbal description of our question ð j Þ¼ γ . ij ð ijÞ j 1j i 1 j 1 and the main issues that are addressed by our model. We assume ¼ ¼ any read has been sequenced after bisulfite treatment and where the m and u are as defined for Eq. 1. Because the allele of mapped uniquely to the reference genome. Because we are inter- origin for each read is missing data, we fit the two-allele model ested in mammalian methylation, we restrict our attention to using expectation maximization (21), obtaining expectations on CpG sites both in the genome and in the reads. Reads not map- membership in γ1 and γ2. Details are provided in SI Text. ping over a CpG are ignored. Our goal is to identify intervals of the genome where it appears that the two alleles have different Identifying Intervals of Allele-Specific Methylation. We use Bayesian methylation patterns—typically, in such a case, one allele will be information criterion (BIC) (22) as a model selection criterion highly methylated and the other not. There are two kinds of im- in determining whether a fixed interval is best described using portant information our model must capture: (i) The set of reads a single-allele [Eq. 1] or two-allele [Eq. 2] model. A single-allele mapping into the interval should appear to represent two distinct model has one parameter for each of the n CpGs, and the number methylation patterns, and (ii) the subsets of reads corresponding of observations is equal to jRj: to those two patterns should be in roughly equal proportions be- n R − 2 L Θ R : [4] cause the alleles themselves are present in equal proportions. BICðsingleÞ¼ ln j j ln 1ð j Þ One can consider a methylation pattern as analogous to a hap- For the two-allele model, there are two parameters for each CpG: lotype, but with a strong stochastic component. Therefore, reads that contain only a single CpG will provide us with relatively little
Recommended publications
  • Analyses of Allele-Specific Gene Expression in Highly Divergent
    ARTICLES Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance James J Crowley1,10, Vasyl Zhabotynsky1,10, Wei Sun1,2,10, Shunping Huang3, Isa Kemal Pakatci3, Yunjung Kim1, Jeremy R Wang3, Andrew P Morgan1,4,5, John D Calaway1,4,5, David L Aylor1,9, Zaining Yun1, Timothy A Bell1,4,5, Ryan J Buus1,4,5, Mark E Calaway1,4,5, John P Didion1,4,5, Terry J Gooch1,4,5, Stephanie D Hansen1,4,5, Nashiya N Robinson1,4,5, Ginger D Shaw1,4,5, Jason S Spence1, Corey R Quackenbush1, Cordelia J Barrick1, Randal J Nonneman1, Kyungsu Kim2, James Xenakis2, Yuying Xie1, William Valdar1,4, Alan B Lenarcic1, Wei Wang3,9, Catherine E Welsh3, Chen-Ping Fu3, Zhaojun Zhang3, James Holt3, Zhishan Guo3, David W Threadgill6, Lisa M Tarantino7, Darla R Miller1,4,5, Fei Zou2,11, Leonard McMillan3,11, Patrick F Sullivan1,5,7,8,11 & Fernando Pardo-Manuel de Villena1,4,5,11 Complex human traits are influenced by variation in regulatory DNA through mechanisms that are not fully understood. Because regulatory elements are conserved between humans and mice, a thorough annotation of cis regulatory variants in mice could aid in further characterizing these mechanisms. Here we provide a detailed portrait of mouse gene expression across multiple tissues in a three-way diallel. Greater than 80% of mouse genes have cis regulatory variation. Effects from these variants influence complex traits and usually extend to the human ortholog. Further, we estimate that at least one in every thousand SNPs creates a cis regulatory effect.
    [Show full text]
  • Allele-Specific Demethylation at an Imprinted Mammalian Promoter Andrew J
    Published online 16 October 2007 Nucleic Acids Research, 2007, Vol. 35, No. 20 7031–7039 doi:10.1093/nar/gkm742 Allele-specific demethylation at an imprinted mammalian promoter Andrew J. Wood1,De´ borah Bourc’his2, Timothy H. Bestor3 and Rebecca J. Oakey1,* 1Department of Medical and Molecular Genetics, King’s College London, Guy’s Hospital, London, SE1 9RT, UK, 2INSERM U741, Institut Jacques Monod, 2 Place Jussieu, 75251 Paris, CEDEX 05, France and 3Department of Genetics and Development, College of Physicians and Surgeons of Columbia University, New York, NY10032, USA Received July 4, 2007; Revised August 24, 2007; Accepted September 6, 2007 ABSTRACT Mutations in members of the de novo methyltransferase gene family lead to disruptions in imprinted gene A screen for imprinted genes on mouse expression and to retrotransposon animation (3,4), Chromosome 7 recently identified Inpp5f_v2, suggesting that the two processes are controlled by a a paternally expressed retrogene lying within an common mechanism (5). Dnmt3l encodes a regulatory intron of Inpp5f. Here, we identify a novel paternally protein that stimulates de novo methylation by Dnmt3a expressed variant of the Inpp5f gene (Inpp5f_v3) that and Dnmt3b, but lacks the catalytic motifs necessary for shows a number of unusual features. Inpp5f_v3 methyltransferase activity. Male mice lacking functional initiates from a CpG-rich repeat region adjoining copies of the Dnmt3l gene are sterile due to meiotic arrest, two B1 elements, despite previous reports that which is associated with the upregulation of endogenous SINEs are generally excluded from imprinted pro- retrotransposons (3). Females carrying null mutations in moters. Accordingly, we find that the Inpp5f_v3 the Dnmt3l gene fail to establish imprinted methylation promoter acquires methylation around the time of marks during oogenesis, but show no obvious effects on implantation, when many repeat families undergo de retrotransposon activity (6).
    [Show full text]
  • Genetic Variant in 3' Untranslated Region of the Mouse Pycard Gene
    bioRxiv preprint doi: https://doi.org/10.1101/2021.03.26.437184; this version posted March 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2 3 Title: 4 Genetic Variant in 3’ Untranslated Region of the Mouse Pycard Gene Regulates Inflammasome 5 Activity 6 Running Title: 7 3’UTR SNP in Pycard regulates inflammasome activity 8 Authors: 9 Brian Ritchey1*, Qimin Hai1*, Juying Han1, John Barnard2, Jonathan D. Smith1,3 10 1Department of Cardiovascular & Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, 11 Cleveland, OH 44195 12 2Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 13 44195 14 3Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine of Case Western 15 Reserve University, Cleveland, OH 44195 16 *, These authors contributed equally to this study. 17 Address correspondence to Jonathan D. Smith: email [email protected]; ORCID ID 0000-0002-0415-386X; 18 mailing address: Cleveland Clinic, Box NC-10, 9500 Euclid Avenue, Cleveland, OH 44195, USA. 19 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.26.437184; this version posted March 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 20 Abstract 21 Quantitative trait locus mapping for interleukin-1 release after inflammasome priming and activation 22 was performed on bone marrow-derived macrophages (BMDM) from an AKRxDBA/2 strain intercross.
    [Show full text]
  • The Abundance of Cis-Acting Loci Leading to Differential Allele
    Yeo et al. BMC Genomics (2016) 17:620 DOI 10.1186/s12864-016-2922-9 RESEARCH ARTICLE Open Access The abundance of cis-acting loci leading to differential allele expression in F1 mice and their relationship to loci harboring genes affecting complex traits Seungeun Yeo1, Colin A. Hodgkinson1, Zhifeng Zhou1, Jeesun Jung2, Ming Leung1, Qiaoping Yuan1 and David Goldman1* Abstract Background: Genome-wide surveys have detected cis-acting quantitative trait loci altering levels of RNA transcripts (RNA-eQTLs) by associating SNV alleles to transcript levels. However, the sensitivity and specificity of detection of cis- expression quantitative trait loci (eQTLs) by genetic approaches, reliant as it is on measurements of transcript levels in recombinant inbred strains or offspring from arranged crosses, is unknown, as is their relationship to QTL’s for complex phenotypes. Results: We used transcriptome-wide differential allele expression (DAE) to detect cis-eQTLs in forebrain and kidney from reciprocal crosses between three mouse inbred strains, 129S1/SvlmJ, DBA/2J, and CAST/EiJ and C57BL/6 J. Two of these crosses were previously characterized for cis-eQTLs and QTLs for various complex phenotypes by genetic analysis of recombinant inbred (RI) strains. 5.4 %, 1.9 % and 1.5 % of genes assayed in forebrain of B6/ 129SF1, B6/DBAF1, and B6/CASTF1 mice, respectively, showed differential allelic expression, indicative of cis-acting alleles at these genes. Moreover, the majority of DAE QTLs were observed to be tissue-specific with only a small fraction showing cis-effects in both tissues. Comparing DAE QTLs in F1 mice to cis-eQTLs previously mapped in RI strains we observed that many of the cis-eQTLs were not confirmed by DAE.
    [Show full text]
  • Long Non-Coding RNA Landscape in Prostate Cancer Molecular Subtypes: a Feature Selection Approach
    International Journal of Molecular Sciences Article Long Non-Coding RNA Landscape in Prostate Cancer Molecular Subtypes: A Feature Selection Approach Simona De Summa 1,* , Antonio Palazzo 2 , Mariapia Caputo 1, Rosa Maria Iacobazzi 3 , Brunella Pilato 1, Letizia Porcelli 3, Stefania Tommasi 1 , Angelo Virgilio Paradiso 4,† and Amalia Azzariti 3,† 1 Molecular Diagnostics and Pharmacogenetics Unit, IRCCS IstitutoTumori Giovanni Paolo II, 70124 Bari, Italy; [email protected] (M.C.); [email protected] (B.P.); [email protected] (S.T.) 2 Laboratory of Nanotechnology, IRCCS IstitutoTumori Giovanni Paolo II, 70124 Bari, Italy; [email protected] 3 Laboratory of Experimental Pharmacology, IRCCS Istituto Tumori Giovanni Paolo II, 70124 Bari, Italy; [email protected] (R.M.I.); [email protected] (L.P.); [email protected] (A.A.) 4 Scientific Directorate, IRCCS Istituto Tumori Giovanni Paolo II, 70124 Bari, Italy; [email protected] * Correspondence: [email protected] † Co-senior authors. Abstract: Prostate cancer is one of the most common malignancies in men. It is characterized by a high molecular genomic heterogeneity and, thus, molecular subtypes, that, to date, have not been used in clinical practice. In the present paper, we aimed to better stratify prostate cancer patients through the selection of robust long non-coding RNAs. To fulfill the purpose of the study, a bioinformatic approach focused on feature selection applied to a TCGA dataset was used. In such a way, LINC00668 and long non-coding(lnc)-SAYSD1-1, able to discriminate ERG/not-ERG subtypes, Citation: De Summa, S.; Palazzo, A.; were demonstrated to be positive prognostic biomarkers in ERG-positive patients.
    [Show full text]
  • Using Next-Generation RNA Sequencing to Identify Imprinted Genes
    Heredity (2014) 113, 156–166 & 2014 Macmillan Publishers Limited All rights reserved 0018-067X/14 www.nature.com/hdy ORIGINAL ARTICLE Using next-generation RNA sequencing to identify imprinted genes X Wang1,2 and AG Clark1,2 Genomic imprinting is manifested as differential allelic expression (DAE) depending on the parent-of-origin. The most direct way to identify imprinted genes is to directly score the DAE in a context where one can identify which parent transmitted each allele. Because many genes display DAE, simply scoring DAE in an individual is not sufficient to identify imprinted genes. In this paper, we outline many technical aspects of a scheme for identification of imprinted genes that makes use of RNA sequencing (RNA-seq) from tissues isolated from F1 offspring derived from the pair of reciprocal crosses. Ideally, the parental lines are from two inbred strains that are not closely related to each other. Aspects of tissue purity, RNA extraction, library preparation and bioinformatic inference of imprinting are all covered. These methods have already been applied in a number of organisms, and one of the most striking results is the evolutionary fluidity with which novel imprinted genes are gained and lost within genomes. The general methodology is also applicable to a wide range of other biological problems that require quantification of allele-specific expression using RNA-seq, such as cis-regulation of gene expression, X chromosome inactivation and random monoallelic expression. Heredity (2014) 113, 156–166; doi:10.1038/hdy.2014.18; published online 12 March 2014 INTRODUCTION et al.,2003;Kuzminet al., 2008; Sritanaudomchai et al.,2010)and In diploid organisms, a subset of genes are expressed exclusively or uniparental disomic mice (Choi et al., 2001, 2005; Schulz et al.,2006), preferentially from one of the two parental alleles, resulting in allelic expression profiling using allele-specific single-nucleotide polymorph- imbalance (AI) in gene expression (Pastinen and Hudson, 2004).
    [Show full text]
  • A Genomics Approach Reveals Insights Into the Importance of Gene Losses for Mammalian Adaptations
    Corrected: Publisher correction ARTICLE DOI: 10.1038/s41467-018-03667-1 OPEN A genomics approach reveals insights into the importance of gene losses for mammalian adaptations Virag Sharma1,2,3, Nikolai Hecker1,2,3, Juliana G. Roscito1,2,3, Leo Foerster1,2,3, Bjoern E. Langer1,2,3 & Michael Hiller1,2,3 1234567890():,; Identifying the genomic changes that underlie phenotypic adaptations is a key challenge in evolutionary biology and genomics. Loss of protein-coding genes is one type of genomic change with the potential to affect phenotypic evolution. Here, we develop a genomics approach to accurately detect gene losses and investigate their importance for adaptive evolution in mammals. We discover a number of gene losses that likely contributed to morphological, physiological, and metabolic adaptations in aquatic and flying mammals. These gene losses shed light on possible molecular and cellular mechanisms that underlie these adaptive phenotypes. In addition, we show that gene loss events that occur as a consequence of relaxed selection following adaptation provide novel insights into species’ biology. Our results suggest that gene loss is an evolutionary mechanism for adaptation that may be more widespread than previously anticipated. Hence, investigating gene losses has great potential to reveal the genomic basis underlying macroevolutionary changes. 1 Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany. 2 Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany. 3 Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany. Correspondence and requests for materials should be addressed to M.H. (email: [email protected]) NATURE COMMUNICATIONS | (2018) 9:1215 | DOI: 10.1038/s41467-018-03667-1 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03667-1 ne of the most fascinating aspects of nature is the a b % conserved genes diversity of life.
    [Show full text]
  • Array Painting Reveals a High Frequency of Balanced Translocations in Breast Cancer Cell Lines That Break in Cancer-Relevant Genes
    Oncogene (2008) 27, 3345–3359 & 2008 Nature Publishing Group All rights reserved 0950-9232/08 $30.00 www.nature.com/onc ONCOGENOMICS Array painting reveals a high frequency of balanced translocations in breast cancer cell lines that break in cancer-relevant genes KD Howarth1, KA Blood1,BLNg2, JC Beavis1, Y Chua1, SL Cooke1, S Raby1, K Ichimura3, VP Collins3, NP Carter2 and PAW Edwards1 1Department of Pathology, Hutchison-MRC Research Centre, University of Cambridge, Cambridge, UK; 2Wellcome Trust Sanger Institute, Cambridge, UK and 3Department of Pathology, Division of Molecular Histopathology, Addenbrookes Hospital, University of Cambridge, Cambridge, UK Chromosome translocations in the common epithelial tion and inversion, which can result in gene fusion, cancers are abundant, yet little is known about them. promoter insertion or gene inactivation. As is well They have been thought to be almost all unbalanced and known in haematopoietic tumours and sarcomas, therefore dismissed as mostly mediating tumour suppres- translocations and inversions can have powerful onco- sor loss. We present a comprehensive analysis by array genic effects on specific genes and play a central role in painting of the chromosome translocations of breast cancer development (Rowley, 1998). In the past there cancer cell lines HCC1806, HCC1187 and ZR-75-30. In has been an implicit assumption that such rearrange- array painting, chromosomes are isolated by flow ments are not significant players in the common cytometry, amplified and hybridized to DNA microarrays. epithelial
    [Show full text]
  • Supplementary Information
    Supplementary Information A genomics approach reveals insights into the importance of gene losses for mammalian adaptations Sharma et al. The Supplementary Information contains - Supplementary Figures 1 - 35 - Supplementary Tables 1 - 8 - Supplementary Notes 1 - 8 1 A reference species with B annotated functional genes ? ? ? ? ? ? use Dollo parsimony ? ? to infer gene ancestry ? search for gene losses reference ? in query species ? ? ? ? ? non-ancestral branches Supplementary Figure 1: General framework for detecting gene losses in genome alignments. (A) Our approach considers all coding genes that are annotated and thus likely functional in a chosen reference species. We detect loss of a given gene in other query species by searching genome alignments for gene-inactivating mutations. Genome alignments are well-suited to detect gene losses for the following reasons. First, genome alignments can reveal the remnants of inactivated but not completely deleted genes, even if these genes are not expressed anymore and thus are not contained in a transcriptome or in mRNA/protein databases. Second, splice site mutations, which are one important class of inactivating mutations, can only be detected at the genomic but not at the mRNA/protein level. Third, information about missing sequence (assembly gaps, regions of low sequencing quality) are only visible by direct genome analysis. This is important as the absence of a gene in a gene/protein database or in a genomic BLAST run cannot distinguish between artifacts that perfectly mimic absence of a gene (such as large assembly gaps) and the complete deletion of a gene. Since gene loss in a query species requires that the common ancestor of the reference and this query species possessed the gene, we used Dollo parsimony to infer gene ancestry based on query species where the gene lacks any gene-inactivating mutations.
    [Show full text]
  • Accurate Prediction of Kinase-Substrate Networks Using
    bioRxiv preprint doi: https://doi.org/10.1101/865055; this version posted December 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Accurate Prediction of Kinase-Substrate Networks Using Knowledge Graphs V´ıtNov´aˇcek1∗+, Gavin McGauran3, David Matallanas3, Adri´anVallejo Blanco3,4, Piero Conca2, Emir Mu~noz1,2, Luca Costabello2, Kamalesh Kanakaraj1, Zeeshan Nawaz1, Sameh K. Mohamed1, Pierre-Yves Vandenbussche2, Colm Ryan3, Walter Kolch3,5,6, Dirk Fey3,6∗ 1Data Science Institute, National University of Ireland Galway, Ireland 2Fujitsu Ireland Ltd., Co. Dublin, Ireland 3Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland 4Department of Oncology, Universidad de Navarra, Pamplona, Spain 5Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland 6School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland ∗ Corresponding authors ([email protected], [email protected]). + Lead author. 1 bioRxiv preprint doi: https://doi.org/10.1101/865055; this version posted December 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Abstract Phosphorylation of specific substrates by protein kinases is a key control mechanism for vital cell-fate decisions and other cellular pro- cesses. However, discovering specific kinase-substrate relationships is time-consuming and often rather serendipitous.
    [Show full text]
  • Quantitative Trait Loci Mapping of Macrophage Atherogenic Phenotypes
    QUANTITATIVE TRAIT LOCI MAPPING OF MACROPHAGE ATHEROGENIC PHENOTYPES BRIAN RITCHEY Bachelor of Science Biochemistry John Carroll University May 2009 submitted in partial fulfillment of requirements for the degree DOCTOR OF PHILOSOPHY IN CLINICAL AND BIOANALYTICAL CHEMISTRY at the CLEVELAND STATE UNIVERSITY December 2017 We hereby approve this thesis/dissertation for Brian Ritchey Candidate for the Doctor of Philosophy in Clinical-Bioanalytical Chemistry degree for the Department of Chemistry and the CLEVELAND STATE UNIVERSITY College of Graduate Studies by ______________________________ Date: _________ Dissertation Chairperson, Johnathan D. Smith, PhD Department of Cellular and Molecular Medicine, Cleveland Clinic ______________________________ Date: _________ Dissertation Committee member, David J. Anderson, PhD Department of Chemistry, Cleveland State University ______________________________ Date: _________ Dissertation Committee member, Baochuan Guo, PhD Department of Chemistry, Cleveland State University ______________________________ Date: _________ Dissertation Committee member, Stanley L. Hazen, MD PhD Department of Cellular and Molecular Medicine, Cleveland Clinic ______________________________ Date: _________ Dissertation Committee member, Renliang Zhang, MD PhD Department of Cellular and Molecular Medicine, Cleveland Clinic ______________________________ Date: _________ Dissertation Committee member, Aimin Zhou, PhD Department of Chemistry, Cleveland State University Date of Defense: October 23, 2017 DEDICATION I dedicate this work to my entire family. In particular, my brother Greg Ritchey, and most especially my father Dr. Michael Ritchey, without whose support none of this work would be possible. I am forever grateful to you for your devotion to me and our family. You are an eternal inspiration that will fuel me for the remainder of my life. I am extraordinarily lucky to have grown up in the family I did, which I will never forget.
    [Show full text]
  • Locating Potentially Lethal Genes Using the Abnormal Distributions of Genotypes
    www.nature.com/scientificreports OPEN Locating potentially lethal genes using the abnormal distributions of genotypes Received: 29 January 2019 Xiaojun Ding & Xiaoshu Zhu Accepted: 10 July 2019 Genes are the basic functional units of heredity. Diferences in genes can lead to various congenital Published: xx xx xxxx physical conditions. One kind of these diferences is caused by genetic variations named single nucleotide polymorphisms (SNPs). An SNP is a variation in a single nucleotide that occurs at a specifc position in the genome. Some SNPs can afect splice sites and protein structures and cause gene abnormalities. SNPs on paired chromosomes may lead to fatal diseases so that a fertilized embryo cannot develop into a normal fetus or the people born with these abnormalities die in childhood. The distributions of genotypes on these SNP sites are diferent from those on other sites. Based on this idea, we present a novel statistical method to detect the abnormal distributions of genotypes and locate the potentially lethal genes. The test was performed on HapMap data and 74 suspicious SNPs were found. Ten SNP maps “reviewed” genes in the NCBI database. Among them, 5 genes were related to fatal childhood diseases or embryonic development, 1 gene can cause spermatogenic failure, and the other 4 genes were associated with many genetic diseases. The results validated our method. The method is very simple and is guaranteed by a statistical test. It is an inexpensive way to discover potentially lethal genes and the mutation sites. The mined genes deserve further study. Genes are the most important genetic materials that determine the health of a person.
    [Show full text]