View a Copy of This Licence, Visit
Total Page:16
File Type:pdf, Size:1020Kb
Boroni et al. Clinical Epigenetics (2020) 12:105 https://doi.org/10.1186/s13148-020-00899-1 RESEARCH Open Access Highly accurate skin-specific methylome analysis algorithm as a platform to screen and validate therapeutics for healthy aging Mariana Boroni1,2* , Alessandra Zonari2, Carolina Reis de Oliveira2, Kallie Alkatib2, Edgar Andres Ochoa Cruz2, Lear E. Brace2 and Juliana Lott de Carvalho2,3,4 Abstract Background: DNA methylation (DNAm) age constitutes a powerful tool to assess the molecular age and overall health status of biological samples. Recently, it has been shown that tissue-specific DNAm age predictors may present superior performance compared to the pan- or multi-tissue counterparts. The skin is the largest organ in the body and bears important roles, such as body temperature control, barrier function, and protection from external insults. As a consequence of the constant and intimate interaction between the skin and the environment, current DNAm estimators, routinely trained using internal tissues which are influenced by other stimuli, are mostly inadequate to accurately predict skin DNAm age. Results: In the present study, we developed a highly accurate skin-specific DNAm age predictor, using DNAm data obtained from 508 human skin samples. Based on the analysis of 2,266 CpG sites, we accurately calculated the DNAm age of cultured skin cells and human skin biopsies. Age estimation was sensitive to the biological age of the donor, cell passage, skin disease status, as well as treatment with senotherapeutic drugs. Conclusions: This highly accurate skin-specific DNAm age predictor constitutes a holistic tool that will be of great use in the analysis of human skin health status/molecular aging, as well as in the analysis of the potential of established and novel compounds to alter DNAm age. Keywords: Aging, Skin aging, Epigenetics, DNA methylation, Fibroblasts, Molecular clock, DNAm age algorithm Background hallmarks of aging, emerging as active areas of investiga- Aging is defined as a complex, multifactorial process asso- tion [1]. Age-related epigenetic alterations have gained ciated with functional decline of organs and tissues, lead- prominence in this context since the observation that ing to increased chances of death [1]. Currently, genomic DNA methylation (DNAm) undergoes predictable time- instability, telomere attrition, epigenetic alterations, loss of dependent modifications, which can be explored as a proteostasis, deregulated nutrient sensing, mitochondrial highly accurate method to estimate the molecular versus dysfunction, cellular senescence, stem cell exhaustion, and the chronological age of human tissues [2]. altered intercellular communication are considered the Recently, additional factors beyond time have been shown to influence DNAm age, such as genetic diseases [3, 4], in- * Correspondence: [email protected] fectious diseases [5], inflammatory disorders [6, 7], and life- 1Bioinformatics and Computational Biology Lab, Division of Experimental and style factors, such as exercise and obesity [8], to cite a few. Translational Research, Brazilian National Cancer Institute, Rio de Janeiro, RJ DNAm data was also significantly correlated to mortality 20231-050, Brazil 2OneSkin Technologies, San Francisco, USA Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Boroni et al. Clinical Epigenetics (2020) 12:105 Page 2 of 16 risk [9]. Therefore, DNAm has evolved from a chrono- Results logical age estimation tool to an indicator of overall health. Dataset description Cellular senescence is an evolving concept and is cur- The utilized workflow is depicted in Additional File 1– rently defined as a cellular state characterized by four Fig S1. For the algorithm development, we analyzed pre- main aspects: an irreversible cell-cycle arrest, an inflam- viously published DNAm data of human skin biopsies, matory senescence-associated secretory phenotype retrieved from three datasets. The specific datasets, (SASP), macromolecular damage (DNA, protein and GSE51954, E-MTAB-4385, GSE90124, are available in lipid damage), and altered metabolism [10]. Other as- public databanks (GEO and ArrayExpress) and each pects attributed to senescent cells include apoptosis re- comprise epigenetic data, as well as additional informa- sistance, senescence-associated heterochromatin foci, tion about the analyzed samples, i.e., origin tissue, donor and morphological alterations. Even though cellular sen- sex, and chronological age (18–95 years old). A total of escence and organismal aging are not synonymous, re- 508 samples (40 derived from the dermis, 146 from the cent data supports the fact that DNAm aging observed epidermis, and 322 samples derived from whole skin tis- in vivo also occurs in vitro in senescent cell cultures sue) were interrogated for whole-genome methylation (e.g., mammalian cell cultures) in a highly reproducible levels by more than 450,000 CpG probes per sample. pattern [11, 12]. Indeed, DNAm of cultured cells can be The main characteristics of the cohort are described in used to predict cellular passage, regardless of donor Additional File 2–Supplementary Table S1. chronological age [13]. In this context, even though the mechanism by which cellular passage and time promote Data normalization and pre-selection of features DNAm alterations in cells/tissues has not been clarified, Each dataset was individually processed for quality con- a few functional correlations between DNA methylation trol and merged for preprocessing in order to build a and phenotype have been described. For instance, as machine learning algorithm able to accurately predict shown by Xie et al. [14], during replicative cellular sen- DNAm age. After normalizing all datasets by quantile escence, hypermethylation occurs mainly in promoter (Additional File 3–Fig S2), we obtained a homogeneous sites of genes responsible for cellular biosynthesis and dataset with 397,598 probes. metabolism regulation, promoting, or at least favoring, a We then removed 1720 cross-reactive probes [20]and gradual decrease of biosynthetic processes observed in 26,490 probes which were not present in the new version senescent cells. Importantly, not only epigenetic alterations of the EPIC array. Probes targeting sex chromosomes were are shared between cellular senescence and organismal also removed, resulting in 369,388 probes. A feature selec- aging but also other molecular and phenotypic aspects, in- tion step was performed to reduce the dimensionality of cluding telomere attrition and slower renewal rate [2]. our dataset using a package as a wrapper for three differ- These findings fuel the current perception that cellular sen- ent algorithms implementations (Additional File 4–Fig escence is a good model to be investigated in aging studies. S3). Since the number of features is much greater than the The possibility to study aging in vitro initiates the oppor- samples in our dataset (the curse of dimensionality), this tunity to apply DNAm age as a parameter to screen and/or is an important step to reduce a model’s overfitting, i.e., validate potential investigational senotherapeutic com- highly accurate on training data but poor generalization pounds, defined as molecules targeting senescent cells, ei- on unseen test data, while improving its accuracy if a ther by promoting their death (senolytics), or altering the proper feature' subset is chosen [21]. By reducing the senescence status (senomorphics) [15]. complexity of a model, we also reduce the time of training. DNAm aging has been characterized in numerous tis- Each different algorithm ranked the 369,388 probes ac- sues, including the skin [16]. Nevertheless, perhaps due to cording to their importance to predict the sample’sage. the high influence of environmental factors on skin aging The wrapper also ranked the probes according to the [17], the pan-tissue algorithm developed to estimate tissue union-importance, which was calculated considering the aging failed to accurately calculate the chronological age results of the three algorithms together. We first retrieved of cultured fibroblasts [2]. Also, phenotypic skin analysis the top 100 probes ranked according to each different al- failed to relate to blood DNAm age [18]. In recognition of gorithm and the top 2000 probes ranked according to the the limited accuracy of the pan-tissue estimator of DNAm union-importance. We also added-up the top 400 probes age, Horvath