The Somatic Mutation Landscape of the Human Body Pablo E

García-Nieto et al. Genome Biology (2019) 20:298 https://doi.org/10.1186/s13059-019-1919-5 RESEARCH Open Access The somatic mutation landscape of the human body Pablo E. García-Nieto, Ashby J. Morrison and Hunter B. Fraser* Abstract Background: Somatic mutations in healthy tissues contribute to aging, neurodegeneration, and cancer initiation, yet they remain largely uncharacterized. Results: To gain a better understanding of the genome-wide distribution and functional impact of somatic mutations, we leverage the genomic information contained in the transcriptome to uniformly call somatic mutations from over 7500 tissue samples, representing 36 distinct tissues. This catalog, containing over 280,000 mutations, reveals a wide diversity of tissue-specific mutation profiles associated with gene expression levels and chromatin states. For example, lung samples with low expression of the mismatch-repair gene MLH1 show a mutation signature of deficient mismatch repair. In addition, we find pervasive negative selection acting on missense and nonsense mutations, except for mutations previously observed in cancer samples, which are under positive selection and are highly enriched in many healthy tissues. Conclusions: These findings reveal fundamental patterns of tissue-specific somatic evolution and shed light on aging and the earliest stages of tumorigenesis. Keywords: Somatic mutation, Cancer, Aging, Genomic instability, Somatic evolution, Human Background more comprehensive understanding of somatic mutations In humans, somatic mutations play a key role in senescence across the human body has been limited by the small num- and tumorigenesis [1]. Pioneering work on somatic evolution ber of tissues studied to date. in cancer has led to the characterization of cancer driver Most studies on somatic evolution in healthy tissues genes [2] and mutation signatures [3]; the interplay between have sequenced DNA from biopsies to high coverage. chromatin, nuclear architecture, carcinogens, and the muta- However, the transcriptome also carries all the genomic tional landscape [4–7]; the evolutionary forces acting on information of a cell’s transcribed genome, in addition somatic mutations [8–11]; and clinical implications of som- to RNA-specific mutations or edits. RNA-seq has been atic mutations [12]. used to identify germline DNA variants [21], and re- Somatic mutations have been far less studied in healthy cently, single-cell (sc) RNA-seq was used to call DNA human tissues than in cancer. Early studies focused on blood somatic mutations in the pancreas of several people [22]. [13, 14] as it is readily accessible and because of the known To systematically identify somatic mutations in the effects of immune-driven somatic mutation. Recently, som- human body and to investigate their distribution and atic mutations have been characterized in tissues like the skin functional impact, we developed a method that leverages [15], brain [16, 17], esophagus [18, 19], and colon [20]. These the genomic information carried by RNA to identify studies confirmed that cells harboring certain mutations ex- DNA somatic mutations while avoiding most sources of pand clonally, and the number of clonal populations—as well false positives. We applied it to infer somatic mutations as the total number of somatic mutations—increases with across 36 non-cancerous tissues, allowing us to explore age. Additionally, recurrent positively selected mutations in the landscape of somatic mutations throughout the hu- specific genes (e.g., NOTCH1) were observed. However, a man body. * Correspondence: [email protected] Department of Biology, Stanford University, 371 Jane Stanford Way, Stanford, CA 94305, USA © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. García-Nieto et al. Genome Biology (2019) 20:298 Page 2 of 20 Results end of the spectrum are those with low environmental Somatic mutation calling across 36 non-disease tissues exposure or low cellular turnover such as the brain, ad- from 547 people renal gland, prostate, and several types of muscle—heart, Our method considers genomic positions where we ob- esophagus muscularis, and skeletal muscle (Fig. 2a). served two alleles in the RNA-seq reads and assesses We observed similar trends across all six possible whether they are likely to be bona fide DNA somatic point mutation types (complementary mutations—such mutations (Fig. 1a). We optimized the minimum levels as C>T and G>A—were collapsed because when one is of RNA-seq read depth, sequence quality, and number detected the other is always present on the other DNA of reads supporting the variant allele to limit the impact strand) (Additional file 1: Figure S3a). C>T and T>C of sequencing errors (Additional file 1: Figure S1a; see mutations were the most abundant types, followed by the “Methods” section). We then applied extensive filters C>A (Additional file 1: Figure S3b; Additional file 7: to eliminate false positives from biological and technical Table S5). sources, including RNA editing, sequencing errors, and mapping errors (see the “Methods” section; Fig. 1b, c; Mutation load is affected by age, sex, ethnicity, and Additional file 1: Figure S2a; Additional file 2: Supple- natural selection mentary Tables legends and Additional file 3: Table S1). After controlling for all known technical factors (see the To validate the method, we compared somatic muta- “Methods” section), we observed that age was positively cor- tion calls from 105 blood RNA-seq samples to exome related with mutation load across most tissues (Add- DNA sequencing performed on the same samples (GTEx itional file 1: Figure S4a; 29/36 tissues with Spearman ρ >0, − Consortium, 2017) (Fig. 1d). We observed a false- binomial p =1.6×10 4).Bloodshowedthemostsignificant discovery rate (FDR) of 29% which represents the per- age association for all mutation types combined (Fig. 2b; centage of somatic mutations called from RNA-seq not 1st–4th-quartile Wilcoxon p = 0.013; Spearman ρ =0.21, having evidence in the corresponding DNA exome-seq Benjamini-Hochberg [BH] FDR < 0.0001), whereas the sun- sample (Fig. 1d, Additional file 1: Figure S1c; see the exposed skin was the most significant for C>T mutations, “Methods” section). Mutations with a higher variant the most common mutation associated with UV radiation allele frequency (VAF) had an overall lower FDR (Add- (1st–4th-quartile Wilcoxon p = 0.00041; Spearman ρ =0.17 itional file 1: Figure S1d); accounting for this trend BH-FDR = 0.02). We observed other strong age associations yielded an estimated 34% FDR for our complete set of in several brain regions, and in particular the basal ganglia mutation calls (see the “Methods” section). This is com- (Additional file 1: Figure S4a), supporting the hypothesis that parable to the 40% FDR in a previous study that inferred somatic mutation load could contribute to the increasing risk mutations from scRNA-seq in the pancreas [22]. of neurodegenerative diseases with age [23]. After applying the pipeline and filters to RNA-seq data In addition to age, other biological factors also contrib- from the GTEx project, we retained a total of 7584 sam- uted to somatic mutations. For example, women show a ples from 36 different tissues and 547 different individ- much greater mutation load in breast than men (Fig. 2b; − uals with no detectable cancer (Additional file 4: Table Wilcoxon p =1.7×10 8). We also observed female-biased S2). This resulted in a total of 280,843 unique mutations mutations (BH-FDR < 0.1) in the subcutaneous adipose, (Additional file 5: Table S3), most of which were rare visceral adipose, liver, and adrenal gland; in contrast, we across samples (median frequency = 0.026% of samples; found no significant male-biased mutations after multiple Additional file 1: Figure S2b) and across individuals (me- hypothesis correction (Additional file 1: Figure S4c; Add- dian frequency = 0.37% of individuals; Additional file 1: itional file 8: Table S6). Ethnicity can also affect mutation Figure S2c). We found no enrichment of mutations with rates: we found a significant increase of C>T mutations in VAF close to 0.5 (Additional file 1: Figure S1e), suggest- Caucasian sun-exposed skin compared to non-exposed ing that there is little contamination of heterozygous skin, but no corresponding difference in African- germline variants. Americans (Fig. 2c, Additional file 1:FigureS4b),likely We first investigated the factors influencing mutation due to protection against UV radiation provided by higher counts per sample and tissue (see the “Methods” sec- melanin content. Finally, we observed that the number of tion). The main contributor was sequencing depth and stem cell divisions [24] in a tissue was weakly correlated to a lesser extent other biological and technical factors with mutation load (Additional file 9: Note S1, Add- (Additional file 1: Figure S2d, Additional file 6: Table itional file 1: Figure S5). S4). Tissues that have more mutations than expected To investigate the role of natural selection on somatic from sequencing depth include those most often ex- mutations, we examined the variant allele frequency posed to environmental mutagens or with a high cellular (VAF) within every sample where a given mutation was turnover like the skin, lung, blood, esophagus mucosa, observed. We expected that most mutations we observed spleen, liver, and small intestine (Fig. 2a). On the other were present in clonal expansions of cells, since García-Nieto et al. Genome Biology (2019) 20:298 Page 3 of 20 Fig. 1 (See legend on next page.) García-Nieto et al.

Load more