City Research Online

City, University of London Institutional Repository

Citation: Noble, R. ORCID: 0000-0002-8057-4252, Kaltz, O. and Hochberg, M. E. (2015). Peto's paradox and human cancers. Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1673), 20150104.. doi: 10.1098/rstb.2015.0104

This is the accepted version of the paper.

This version of the publication may differ from the final published version.

Permanent repository link: https://openaccess.city.ac.uk/id/eprint/24709/

Link to published version: http://dx.doi.org/10.1098/rstb.2015.0104

Copyright: City Research Online aims to make research outputs of City, University of London available to a wider audience. Copyright and Moral Rights remain with the author(s) and/or copyright holders. URLs from City Research Online may be freely distributed and linked to.

Reuse: Copies of full items can be used for personal research or study, educational, or not-for-profit purposes without prior permission or charge. Provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.

City Research Online: http://openaccess.city.ac.uk/ [email protected] Submitted to Phil. Trans. R. Soc. B - Issue

Peto's paradox and human cancers

Journal: Philosophical Transactions B For ReviewManuscript ID: RSTB-2015-0104.R1 Only Article Type: Research

Date Submitted by the Author: 26-Apr-2015

Complete List of Authors: Noble, Robert; University of Montpellier II, Institute of Evolutionary Sciences Hochberg, Michael; University of Montpellier II, Institute of Evolutionary Sciences; Santa Fe Institute, Kaltz, Oliver; University of Montpellier II, Institute of Evolutionary Sciences

Issue Code: Click here to find the code for your issue.:

Subject: Evolution < BIOLOGY

cancer, stem cells, carcinogenesis, Peto's Keywords: paradox, environment, disease

http://mc.manuscriptcentral.com/issue-ptrsb Page 2 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

1 2 3 4 5 6 7 8 9 10 11 Peto’s paradox and human cancers 12 13 Robert Noble1, Oliver Kaltz1, and Michael E Hochberg∗1,2 14 15 1Institut des Sciences de l’Evolution, Universit´eMontpellier II, 16 Place E Bataillon, 34095 Montpellier Cedex 5, France 17 2Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, 18 For ReviewUSA Only 19 20 21 22 23 Abstract 24 Peto’s paradox is the lack of the expected trend in cancer incidence 25 as a function of body size and lifespan across species. The leading hy- 26 pothesis to explain this pattern is natural selection for differential cancer 27 prevention in larger, longer-lived species. We evaluate whether a similar 28 effect exists within species, specifically humans. We begin by reanalyzing a recently published dataset to separate the effects of stem cell number 29 and replication rate, and show that each has an independent effect on 30 cancer incidence. When considering the lifetime number of stem cell di- 31 visions in an extended dataset, and removing cases associated with other 32 diseases or carcinogens, we find that lifetime cancer incidence per tissue 33 saturates at approximately 0.3-1.3% for the types considered. We fur- 34 ther demonstrate that grouping by anatomical site explains most of the 35 remaining variation. Our results indicate that cancer risk depends not only on number of stem cell divisions but varies enormously (∼10,000 36 times) depending on anatomical site. We conclude that variation in risk 37 of human cancer types is analogous to the paradoxical lack of variation in 38 cancer incidence among animal species, and may likewise be understood 39 as a result of evolution by natural selection. 40 Keywords. cancer, stem cells, carcinogenesis, Peto’s paradox, environ- 41 ment, disease 42 43 44 Introduction 45 All else being equal, the probability of obtaining at least a single invasive cancer 46 over an organism’s lifespan should scale with the number of “targets” – that is, 47 cell number, cell lifespan, and the number of cell divisions – over a lifetime. 48 Peto’s paradox is the lack of such a relationship [1], and is good evidence that 49 over the millions of generations of multicellular evolution, natural selection has 50 provided species with levels of cancer protection that are proportional to their 51 body masses and lifespans ([2, 3, 4]; Aktipis et al. [5], this issue). Several 52 articles in this Special Issue present background and new results on some of 53 the causes, protection mechanisms and emergent patterns. Far less studied is

54 ∗ 55 Corresponding author. Email: [email protected] 56 57 1 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Submitted to Phil. Trans. R. Soc. B - Issue Page 3 of 21

1 2 3 4 5 6 7 the extent to which this phenomenon, is obtained within a species [6]. That 8 is, do different cell lineages within an organism show differing levels of cancer 9 protection based on their relative vulnerabilities (e.g., total stem cell number, 10 stem cell division rate, mutagenic exposure), and associated cancer mortality 11 risks for the whole organism? To the extent that cancer is a selective force 12 on differential resistance between cell lineages, such resistance, being a priori 13 costly to evolve, may also result in constraints on the evolution of body plans, 14 an aspect of Peto’s paradox that has rarely been investigated (Boddy et al. [7], 15 this issue; Kokko and Hochberg [8], this issue). 16 Here, we use the lens of evolution to reevaluate and reinterpret the data and 17 results of a recent study by Tomasetti and Vogelstein [9] (hereafter T&V), who 18 exploredFor possible sources Review of variation in risk between Only cancer types. We show 19 that c. 50% of variation in cancer risk is due to tissue size, indicating that, inde- 20 pendent of stem cell divisions, larger tissues are more likely to harbour cancers 21 than smaller tissues. A simpler measure of T&V’s Extra Cancer Risk (a classifi- 22 cation of cancers most likely to be caused by carcinogens) yields similar findings 23 to these authors, with some notable differences. Moreover, when using the total 24 number of stem cell divisions as a metric for cancers that are not typically the re- 25 sult of disease or carcinogenic exposure, and employing additional data sources, 26 we find that lifetime cancer incidence per tissue plateaus at approximately 0.3- 27 1.3% for the cases in an augmented dataset. We further demonstrate that most 28 of the remaining variation in cancer risk can be explained by grouping cancers by anatomical site (e.g., pancreas, bone, intestine), and that each site has a 29 very different risk per stem cell division. Our study indicates new directions 30 for research in showing how tissue characteristics may independently explain 31 variation in cancer incidence. We suggest that evolution by natural selection 32 has occurred on cancer prevention at different anatomical sites in humans as the 33 underlying driver of overall pattern, analogous to variation in cancer incidence 34 across species, i.e. Peto’s paradox. 35 36 37 Results 38 39 Tomasetti and Vogelstein [9] found a correlation between cancer incidence per 40 tissue and the lifetime number of stem cell divisions within the tissue for a set of 41 31 cancer types. They concluded that most of the variation in risk among cancer 42 types could be explained by random mutations and repair errors during cell 43 replication. Furthermore, the authors assessed the remaining variation in cancer 44 risk due to external environment and inherited factors, which they quantified with an Extra Risk Score (ERS). Cancers with high ERS are indeed known to 45 be associated with carcinogenic exposure (Fig. 2 in [9]). Below we reinterpret 46 their results through new statistical analyses and then reanalyse their dataset 47 with several new additions to assess signatures of natural selection for cancer 48 prevention. 49 50 51 Independent contributions of division rate and stem cell 52 number 53 Tomasetti and Vogelstein calculated the total lifetime number of stem cell divi- 54 sions (lscd) as s(2+d)−2, where s is the size of the organ’s stem cell population, 55 56 57 2 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Page 4 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

1 2 3 4 5 6 7 and d is the lifetime number of divisions per stem cell. They then tested for a 8 correlation between lscd and cancer incidence. One way in which their analy- 9 sis could be extended is to differentiate the individual contributions of s and d 10 to cancer risk [10, 11]. For instance, a small stem cell population with many 11 replications (e.g., esophageal cells) may have the same lscd as a large stem cell 12 population with few replications (e.g., lung cells, see Table S1 in [9]). How- 13 ever, in the former case, cancer risk may result mainly from replication errors, 14 while the latter has a considerably larger number of cells potentially exposed to 15 carcinogenic environments at any point in time. 16 We conducted a multiple regression analysis with cancer incidence as the 17 response variable, and log(d+1), log s, and the interaction of log(d+1) and log s 18 as the explanatoryFor variables. Review There was no significant Only correlation between the 19 two explanatory variables (r = 0.16,n = 31,p> 0.3), indicating that variation 20 in s is largely independent of variation in d. The multiple regression revealed 21 significant positive effects of both log s and log(d + 1) on cancer risk (F1,27 > 22 18,p< 0.0002); the interaction between log s and log(d + 1) was not significant 23 (F1,27 = 0.21,p > 0.6). Overall, as expected, the model explained 65% of the 24 variation in cancer risk, which is identical to the estimate for the composite 25 lscd in Tomasetti and Vogelstein [9]. To test the effect of stem cell number on 26 cancer risk, independent of division rate, we first regressed cancer risk on s, 27 and then performed a partial regression of residual cancer risk on d. The aim 28 was to remove effects of stem cell number and so obtain the “pure” effect of division rate. When correcting for effects of s in this way, stem cell division 29 explains 40% of the variation in risk (Figure 1A). This division rate effect is 30 weaker than that found by T&V, because our analysis is based on replications 31 per stem cell rather than over the population of stem cells. Conversely, tissue 32 stem cell number explains 44% of the variation after correcting for stem cell 33 division rate (Figure 1B). Figure 1C depicts the combined positive effects of 34 log(d+1) and log s: cancer risk increases with both increasing stem cell number 35 and replication in the organ. 36 Based on our regression model, we propose a simple alternative evaluation 37 of the replication-independent Extra Cancer Risk (ERS) score. Whereas T&V 38 calculate the ERS as the product of the logarithms of lifetime risk and total 39 stem cell replications, we use the residual lifetime risk, describing the difference 40 between observed and predicted values from the regression (Figure 1C). Like the 41 ERS, our more intuitive method identifies a subset of cancers that occur more 42 often than we would expect from the lifetime number of stem cell divisions, in- 43 cluding most of those that T&V classed as deterministic D-tumours (Figure 2, 44 blue bars). Of equal importance for understanding possible causation, the resid- 45 ual lifetime risk also quantifies the extent to which some cancer rates are lower 46 than expected. Carcinomas of the small intestine, duodenum and pancreas are 47 more than ten times less frequent than one would predict from the total number 48 of stem cell divisions (Figure 2, red bars), and these three cancer types appear 49 as outliers in the residuals distribution and quantile-quantile plots (not shown). 50 We note that very similar results can be obtained using the residuals from the 51 regression of risk against lscd, as has been proposed by Tomasetti and Vogel- 52 stein [12] and Altenberg [10] since the publication of Tomasetti and Vogelstein’s 53 initial article [9]. 54 55 56 57 3 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Submitted to Phil. Trans. R. Soc. B - Issue Page 5 of 21

1 2 3 4 5 6 7 The saturation of cancer risk 8 9 The above analysis separating the effects of s and d revealed significant, in- 10 dependent effects of these variables in explaining variation in cancer incidence 11 in the T&V dataset. The relatively shallow gradients of the linear regression 12 models present a challenge to the hypothesis that the variation in cancer risk 13 is largely due to differences in lifetime numbers of stem cell divisions. We next 14 consider in more detail the shape and interpretation of the relation between the 15 composite index, lscd, and cancer incidence. 16 If the risk per cell division were the same for every tissue then 17 cancer risk = 1 − (1 − C)lscd (1) 18 For Review Only 19 where C is the risk per stem cell division. If C ≪ 1 and cancer risk ≪ 1 (which 20 holds for almost all of the T&V data) then this relationship can be re-expressed 21 (using the binomial approximation) as 22 ≈ × 23 cancer risk C lscd. (2) 24 Equivalently, 25 log(cancer risk) ≈ log(lscd) + log(C), (3) 26 27 which means that the slope of the linear regression models should be approxi- 28 mately 1. Since the gradient of the one-factor linear regression model of T&V is only 0.53, the risk per stem cell division cannot be the same for all tissues. 29 Indeed, the risk per stem cell division decreases approximately linearly with 30 lscd, as shown in Figure 3. The unexpectedly shallow gradient of the correla- 31 tion between cancer risk and lscd has been noted before [12, 10] but has not, in 32 our view, been sufficiently investigated. 33 We propose that the observed relationship between cancer risk and number 34 of stem cell divisions can be partly explained by the saturation of cancer risk 35 at a maximum level substantially less than 100%. There are two (non-mutually 36 exclusive) reasons to expect such a saturating effect. First, different causes of 37 mortality (e.g., cancers, heart disease, cerebrovascular disease, accidents, etc.) 38 each have a characteristic probability distribution as a function of age. Be- 39 cause of the primacy of mortality events, increases in the probability of a given 40 mortality type will tend to be reflected as increased incidence as the age at 41 which that event occurs decreases. Thus, all else being equal, a given source of 42 mortality will not exceed approximately 1/N, where N is the total number of 43 possible attributed, independent causes of mortality. Of course, all else is not 44 equal, but nevertheless we would expect a saturation effect since the cancers 45 in T&V’s dataset tend to be life threatening at older ages (and therefore have 46 less primacy). Second, tissues that are especially vulnerable to life-threatening 47 cancers would be expected to evolve stronger means of protection [6]. That all 48 tissues do not employ the same protection mechanisms would be suggestive of 49 either a fitness cost of cancer protection to the organism (i.e. that the cost of 50 added protection in terms of reductions in survival and reproduction outweighs 51 the benefits of lowered risks of life-threatening cancer), or that the phylogenetic 52 emergence of tissue specific protection was somehow linked with tissue differ- 53 entiation during ontogeny. Therefore, for either or both hypotheses, we would 54 expect the gradient of the correlation between risk and lscd to become shallower 55 with increasing lscd, as illustrated in Figure 4. 56 57 4 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Page 6 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

1 2 3 4 5 6 7 A simple model that is consistent with these assumptions is 8 − − 9 y = − log(a + e x b), (4) 10 where y is log(cancer risk) and x is log(lscd). For small x this function ap- 11 proaches y = x + b (slope = 1), and for large x it approaches y = − log a (slope 12 = 0). We used a least-squares method to fit the nonlinear model to data (using 13 the nls function in the R statistical language [13]). 14 Figure 5A shows the result of fitting the above model to the data for all 15 31 cancer types in the T&V dataset and 3 additional neuroendocrine cancers 16 (small-cell lung carcinoma, and colorectal and small intestine carcinoids – see 17 Supplementary materials for data and sources). According to this model, most 18 of the typesFor with higher Review than expected risk belong Only to subpopulations exposed 19 to carcinogens (hollow circles in Figure 5A). These include lung cancer in smok- 20 ers, intestinal cancer in those with certain inherited genetic alterations, liver 21 and head and neck cancer in those infected with an , and basal cell 22 carcinoma, which is generally correlated with a combination of genetic factors 23 and UV-light exposure, and which is very rarely fatal [14]. 24 When the risks related to specific subpopulations are omitted from the T&V 25 dataset, as expected, the lifetime risk per cancer type saturates at a lower level. 26 In the extended dataset with three additional cancer types, the saturation level 27 is approximately 0.5% (95% CI: 0.3-1.3%, Figure 5B). Moreover, the additional 28 data points (filled circles in Figure 5B) do not change the model fit. There- 29 fore all of the data appear to be consistent with a model in which the risk of 30 life-threatening cancer increases with lscd with a slope 1, until it is bounded by 31 a threshold of c. 0.3-1.3%; i.e., well below the theoretical maximum of 100%. 32 The fit of this model is statistically similar to that of the linear model (residual 33 standard errors 0.59 and 0.58, respectively), and, as in the preceding analysis, 34 s and d have independent, non-correlated statistical effects on variation in inci- 35 dence in the alternative dataset (p < 0.005). However, the non-linear model is more biologically plausible, and it may therefore reveal more about the multiple 36 factors that determine cancer risk, including natural selection. 37 38 39 Variation between tissues 40 We have shown that tissues with higher numbers of stem cell divisions generally 41 have a lower risk of cancer per stem cell division. However, one would also expect 42 cancer risk per stem cell division to be approximately constant within sets of 43 related tissues, which are likely, though not certain, to have similar protection 44 mechanisms. By splitting the data into subsets of similar cancer types, we should 45 be able to divide the variation in risk into two parts (Figure 6). If the members of 46 each subset indeed have similar cancer risk per stem cell division then variation 47 within subsets will be mostly due to lscd, whereas variation between subsets 48 will be related to tissue type and/or environment (e.g., mutagens). 49 In particular, if we assume that carcinogenesis requires a sequence of M 50 mutations then 51 M M−1 M cancer risk ≈ s(dC) ≈ lscd × d C , (5) 52 53 as discussed by Nunney and Muir in this issue [15]. Thus 54 ≈ − 55 log(cancer risk) log(lscd) + (M 1) log(d)+ M log(C). (6) 56 57 5 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Submitted to Phil. Trans. R. Soc. B - Issue Page 7 of 21

1 2 3 4 5 6 7 If division rates d and numbers of mutations M are similar within each subset 8 then the ratio of risks for two cancer types is given by 9 10 log(cancer risk1) − log(cancer risk2) ≈ log(lscd1) − log(lscd2). (7) 11 12 Therefore the slope of the regression line for each subset will be approximately 13 1, and the cancer risk per stem cell division (that is, the risk of acquiring all 14 necessary mutations, relative to lscd) will be approximately the value at which 15 each regression line intercepts the vertical axis (i.e. log lscd = 0, Figure 6). We hypothesized that cancer risk per stem cell division might be associated 16 with either anatomical site or the cell type of the transformed tissue. We first 17 divided the cancer types by anatomical site (Table 1 in Supplementary materi- 18 als), accordingFor to the topographicalReview codes in the InternationalOnly Classification of 19 Diseases for Oncology (ICD-O) [16], which is widely used in clinical diagnosis. 20 We included data for three neuroendocrine cancers not considered by T&V, but 21 excluded six cancer types affecting particular groups (lung cancer in smokers, 22 intestinal cancer in those with certain inherited genetic alterations, and liver 23 and head and neck cancer in those infected with an oncovirus). We then fitted 24 a two-factor regression model to the subsets containing at least two data points 25 (9 subsets, 24 cancer types). According to this model, for each cancer type i, 26 27 log cancer risk(i)= A log lscd(i)+ B(subset(i)), (8) 28 where A is the gradient of the linear regression line (assumed to be the same for 29 all subsets), and B is the intercept (which depends on the subset). Therefore 30 there are nine parameters to be estimated (one slope, and eight subset-specific 31 intercepts). 32 The two-factor regression model explains most of the variation in cancer 33 risk not explained by the model of T&V. For the extended dataset, the model 34 explains 89% of the variation in cancer risk among 24 cancer types (F9,14 = 12.7, 35 Figure 7A). Log lscd by itself explains 68% of the variation, similar to the 36 figure for the set of 31 cancer types analyzed by T&V, whereas the anatomical 37 subset factor explains an additional 21% (subset effect: p = 0.02). Supporting 38 this finding, the risks for three cancer types not considered by T&V (filled 39 circles in Figure 7A) are almost exactly as predicted. There is no significant 40 interaction between the log lscd and subset factors (p = 0.37). Furthermore the 41 gradient within the subsets is 0.86, with standard error 0.14, and is therefore, as 42 predicted, not significantly different from 1. An alternative model that assumes 43 the gradient for each subset is exactly 1 also explains 89% of the variation −5 44 (F8,15 = 15.9; subset effect: p< 1 × 10 ). 45 Note that we chose to include skin cancers in this analysis even though most 46 of the skin cancer risk in the T&V dataset is associated with UV-light exposure 47 [17]. Since UV-light exposure is assumed to increase the mutation risk per stem 48 cell, we would expect this environmental factor to shift the regression line for 49 the skin cancer subset upwards, towards higher cancer risk, but we would still 50 expect the slope to be approximately 1. Indeed, the model fit for skin cancer is 51 similar to that for the other subsets (Figure 7A). 52 Much of the remaining variation is due to the brain cancer subset, but it can 53 be argued that this subset is poorly defined. Whereas glioblastoma typically 54 develops in the mature brain, medulloblastoma is considered to originate in 55 the different environment of the early embryo [18], and it is the only cancer in 56 57 6 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Page 8 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

1 2 3 4 5 6 7 the T&V dataset that predominantly occurs during childhood (median age 9 8 years at diagnosis). When the brain cancer subset is excluded, the two-factor 9 regression model explains 92% of the variation (F8,13 = 18.8) and the subset 10 factor has a more significant effect (p = 0.005). 11 Apart from brain cancers, there is only one cancer type that substantially 12 deviates from the topographical subsets model: although colorectal and duo- 13 denum adenocarcinomas lie almost exactly on a line of slope 1 (also grouping 14 with pancreatic cancers), small intestine adenocarcinoma falls well below this 15 line, being approximately ten times less common than predicted. Therefore a 16 testable prediction of our model is that small intestine adenocarcinoma differs 17 in some important way from the four other intestinal cancers (colorectal and 18 duodenumFor adenocarcinomas, Review and colorectal and smallOnly intestinal carcinoids), or 19 that the estimated lifetime number of stem cell divisions for this cancer type is 20 inaccurate. 21 We also divided the data according to ICD-O morphological code, which de- 22 scribes the cancer cell type. This resulted in five subsets containing at least two 23 data points, which together included 17 cancer types (Table 2 in Supplementary 24 materials). In the two-factor regression model (Equation 8), the morphological 25 subset factor is not significant (p = 0.32). Therefore we found no evidence that 26 cancer risk in this dataset is related to cell type, independent of anatomical site 27 (Figure 7B). Nevertheless, since topography and morphology are moderately 28 correlated in the T&V dataset, our results do not rule out a combined effect. Given that the gradient of the correlation between cancer risk and lscd ap- 29 pears to be close to 1 for each topographical type, we can calculate 30 31 cancer risk per stem cell division = risk ÷ lscd. (9) 32 33 The estimated risks per stem cell division for each individual cancer type are 34 shown in Figure 7C. These estimated risks vary by nearly four orders of mag- nitude – from less than 10−14 for small intestine adenocarcinoma, to approxi- 35 −11 36 mately 10 for osteocarcinoma and thyroid cancers. An untested hypothesis 37 to explain this variation is that the number of genetic or epigenetic alterna- 38 tions required to obtain cancers typical of different anatomical sites, differs in 39 characteristic ways (see also Nunney & Muir [15], this issue). 40 Therefore variation in cancer incidence in the dataset of T&V can be ex- plained by the total number of stem divisions (lscd; [9]), but can also be under- 41 stood as variation explained by tissue size and by variation in cancer risk per 42 stem cell division (this study). When using the composite quantity lscd and 43 only considering cancers that are not linked to heredity, disease or mutagenic 44 exposure, we find that anatomical site explains most of the residual variation. 45 46 47 Discussion 48 49 Despite limitations in the Tomasetti and Vogelstein dataset, it contains a wealth 50 of information that goes beyond their initial analysis. We have made four new 51 findings based on their dataset, and we have verified that these findings are 52 consistent with additional data. First, the total number of stem cells and the 53 lifetime number of divisions per stem cell each significantly, and independently 54 of one another, explain variation in cancer incidence (Figure 1). Indeed, our 55 finding of a significant correlation of s with cancer risk is consistent with the 56 57 7 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Submitted to Phil. Trans. R. Soc. B - Issue Page 9 of 21

1 2 3 4 5 6 7 prediction that cancer incidence increases with the standing population size of 8 an organ [19, 20]. One possible mechanism for the tissue size effect is mutations 9 associated with the 2s cell divisions during ontogeny for certain tissues [21]. 10 Second, our more intuitive measure of Extra Cancer Risk yields results that 11 largely concord with T&V, but also yield certain notable differences (Figure 2). 12 Third, when assessing a subset of 27 cancers that are not primarily linked to 13 pathogens, disease, or carcinogenic exposure, we find a saturating effect of total 14 stem cell divisions on cancer incidence, with a plateau at about 0.5% (Figure 5). 15 This could be explained either by the primacy of mortality events limiting max- 16 imal mortality for any single type of event and/or increased cancer prevention 17 mechanisms in tissues with the most total stem cell replications. Fourth, when 18 dividingFor a subset of 24Review cancers by anatomical site, Only we find that each type shows 19 the same slope of c. 1, but is displaced over 4 orders of magnitude in risk per 20 stem cell division, consistent with the hypothesis that different tissues have con- 21 trasting protection mechanisms against cancer. Data for three neuroendocrine 22 cancers, which were not considered by T&V, closely fit the predictions of this 23 model. Our findings are consistent with the hypothesis that natural selection 24 has resulted in differential cancer prevention in different anatomical sites [6], and 25 to the best of our knowledge is the first such analysis for any cellular disease. 26 We briefly discuss the implications of these findings below. 27 Our analysis clarifies one of the main findings of Tomasetti and Vogelstein 28 [9]: variation in cancer incidence is statistically explained by the independent effects of stem cell division rate (d) and stem cell number (s). Our analyses 29 indicate, both for the full T&V dataset and for a dataset of cases that are a 30 priori least likely to be derived from mutagenic exposure, that both s and d 31 significantly contribute to explaining most of the variation in cancer incidence, 32 and variation explained by s and d are approximately equal in the full T&V 33 dataset. We hypothesize however different relative contributions of d and s to 34 explaining variation in risk of cancers significantly associated with mutagenic 35 effects (e.g., certain cancers with high ERS). Specifically, mutagenic exposure 36 may result in stem cell death and replacement by mutated daughter cells [22]. 37 Thus, we predict that the incidence of mutagen-derived cancers should signif- 38 icantly correlate with the number of standing “targets” (s), and little or not 39 at all with stem cell division rates (d). We were not able to test this hypoth- 40 esis, not only because of the small number of cases in the T&V dataset, but 41 also because mutagenic exposure is likely to vary considerably both within and 42 between cancer types. 43 We have further shown that when considering the total number of stem 44 cell divisions as a single metric that explains most of the variation in cancer 45 incidence, the remaining variation can be significantly explained by anatomical 46 site, corresponding to the biological setting in which cancer arises. The pattern 47 is consistent with differential cancer risks per stem cell division, such that in 48 anatomical sites that harbour a relatively large number of stem cell divisions, 49 each division event entails a relatively low risk of contributing to carcinogenesis. 50 Our results therefore suggest that variation in cancer risk across human tissues 51 is analogous to Peto’s paradox, which is the observed lack of variation in cancer 52 risk across animal species with different body masses and/or lifespans [1, 2, 53 3, 4]. However, the inter-tissue relationship is not flat as in the interspecific 54 comparison, but appears to be an increasing, saturating function. Most of the 55 hypotheses proposed to explain Peto’s paradox invoke the evolution of stronger 56 57 8 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Page 10 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

1 2 3 4 5 6 7 cellular or tissue-level cancer prevention or suppression in larger and longer- 8 lived animal species (reviewed in [3]). Likewise, our results are consistent with 9 a related conjecture of Peto [1] that tissues with high levels of stem cell turnover, 10 such as the lining of the small intestine, might have evolved especially powerful 11 anti-cancer mechanisms. For example, a larger number of mutations might be 12 necessary to initiate cancer in such tissues ([15], this issue), or tissue architecture 13 might act to contain precancerous growths [1]. Most of the cancer types in 14 the dataset occur at older ages and, as has been argued previously (e.g., [4]), 15 such cancers would be shielded from present-day natural selection. Natural 16 selection for cancer prevention is consistent with observations of occurrence at 17 post-reproductive ages, yet maintaining the evolved protection mechanisms that 18 reduce incidencesFor at youngerReview ages [23, 4]. Our analysis Only with an extended dataset 19 confirms our preliminary findings for the T&V dataset [11] and tests more recent 20 predictions [24]. 21 We have shown how simple rules (effects of total number of stem cell divi- 22 sions and anatomical site) can predict variation in cancer incidence with high 23 confidence, when looking across cancer types. By extension, we speculate that 24 the same rules also hold across individuals: having more stem cell divisions in 25 a given anatomical site would then put an individual at greater risk for cancer 26 originating at that site. We have not investigated whether variation in the total 27 number of stem cell divisions between individuals is predictive of cancer risk, 28 but some studies are suggestive of this type of effect (e.g., [20, 25, 4, 26]). Thus, to the extent that a given individual is potentially more prone to certain cancers 29 based on more expected lifetime stem cell divisions, this can be regarded as a 30 risk factor. 31 Future research should extend Tomasetti and Vogelstein’s dataset to other 32 tissue types and cancer types within tissues (most notably high incidence cancers 33 of the breast and prostate). Moreover, we need to identify possible tissue-specific 34 mechanisms of cancer prevention to test the hypothesis that natural selection 35 has influenced not only age related patterns in cancer incidence, but also tissue 36 specific adaptations and cancer as a possible evolutionary constraint on tissue 37 size [7, 8]. 38 39 Acknowledgements 40 41 This work was supported by grants from the Agence National de la Recherche 42 (EvoCan ANR-13-BSV7-0003-01) and INSERM (“Physique Cancer” (CanEvolve 43 PC201306) to MEH. C´eline Devaux, Vincent Devictor, Robert Gatenby, Pierre 44 Gauz`ere, Urszula Hibner, Pierre Martinez, Len Nunney and Christian Tomasetti 45 provided helpful advice. 46 47 Supplementary materials 48 Supplementary materials contain additional data and computer code used to 49 implement our statistical models. 50 51 52 53 54 55 56 57 9 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Submitted to Phil. Trans. R. Soc. B - Issue Page 11 of 21

1 2 3 4 5 6 7 References 8 9 [1] R Peto. , multistage models, and short-term mutagenicity 10 tests. Origins of human cancer, 4:1403–1428, 1977. 11 12 [2] A M Leroi, V Koufopanou, and A Burt. Cancer selection. Nature Reviews 13 Cancer, 3(3):226–231, 2003. 14 [3] A F Caulin and C C Maley. Peto’s Paradox: evolution’s prescription for 15 cancer prevention. Trends in Ecology & Evolution, 26(4):175–182, 2011. 16 17 [4] L Nunney. The real war on cancer: the evolutionary dynamics of cancer 18 suppression.ForEvolutionary Review applications, 6(1):11–19, Only 2013. 19 [5] CA Aktipis, AM Boddy, G Jansen, U Hibner, ME Hochberg, C Maley, and 20 GS Wilkinson. Cancer across the tree of life: cooperation and cheating in 21 multicellularity. In press. 22 23 [6] L Nunney. Lineage selection and the evolution of multistage carcinogen- 24 esis. Proceedings of the Royal Society of London B: Biological Sciences, 25 266(1418):493–498, 1999. 26 [7] A Boddy, H Kokko, F Breden, GS Wilkinson, and CA Aktipis. Cancer sus- 27 ceptibility and reproductive trade-offs: A model of the evolution of cancer 28 defenses. In press. 29 30 [8] H Kokko and ME Hochberg. Towards cancer-aware life-history modelling. 31 In press. 32 33 [9] C Tomasetti and B Vogelstein. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science, 346(6217):78– 34 81, January 2015. 35 36 [10] L Altenberg. Statistical Problems in a Paper on Variation In Cancer Risk 37 Among Tissues, and New Discoveries. arXiv, 1501.04605, 2015. 38 39 [11] Robert Noble, Oliver Kaltz, and Michael E Hochberg. Statistical inter- 40 pretations and new findings on Variation in Cancer Risk Among Tissues. arXiv 41 , pages 1–17, 2015. 42 [12] C Tomasetti and B Vogelstein. Musings on the theory that variation in 43 cancer risk among tissues can be explained by the number of divisions of 44 normal stem cells. arXiv, 1501.05035, January 2015. 45 46 [13] R Core Team. R: A Language and Environment for Statistical Computing. 47 R Foundation for Statistical Computing, Vienna, Austria, 2013. 48 [14] CSM Wong, RC Strange, and JT Lear. Basal cell carcinoma. BMJ: British 49 Medical Journal, 327(7418):794, 2003. 50 51 [15] L Nunney and B Muir. Peto’s paradox and the hallmarks of cancer: con- 52 structing an evolutionary framework for understanding the incidence of 53 cancer. In press. 54 55 56 57 10 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Page 12 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

1 2 3 4 5 6 7 [16] A Fritz, C Percy, A Jack, K Shanmugaratnam, L Sobin, DM Parkin, and 8 S Whelan. International classification of diseases for oncology, third edition. 9 2000. 10 11 [17] J Scotto, T R Fears, Joseph F Fraumeni, et al. Incidence of nonmelanoma 12 skin cancer in the United States, 1983. 13 [18] M F Roussel and M E Hatten. Cerebellum: development and medulloblas- 14 toma. Current topics in developmental biology, 94:235, 2011. 15 16 [19] D Albanes and M Winick. Are cell number and cell proliferation risk factors 17 for cancer? Journal of the National Cancer Institute, 80(10):772–775, 1988. 18 For Review Only 19 [20] R Roychoudhuri, V Putcha, and H Møller. Cancer and laterality: a study of the five major paired organs (UK). Cancer Causes & Control, 17(5):655– 20 662, 2006. 21 22 [21] J DeGregori. Challenging the axiom: does the occurrence of oncogenic mu- 23 tations truly limit cancer development with age? Oncogene, 32(15):1869– 24 1875, 2013. 25 26 [22] J Cairns. Somatic stem cells and the kinetics of mutagenesis and carcino- 27 genesis. Proceedings of the National Academy of Sciences, 99(16):10567– 28 10570, 2002. 29 [23] M E Hochberg, F Thomas, E Assenat, and U Hibner. Preventive evolution- 30 ary medicine of cancers. Evolutionary Applications, 6(1):134–143, 2013. 31 32 [24] Benjamin Roche, Beata Ujvari, and Fr´ed´eric Thomas. Bad luck and cancer: 33 Does evolution spin the wheel of fortune? BioEssays, 53:n/a–n/a, 2015. 34 [25] Geoffrey C Kabat, Matthew L Anderson, Moonseong Heo, H Dean Hos- 35 good, Victor Kamensky, Jennifer W Bea, Lifang Hou, Dorothy S Lane, 36 Jean Wactawski-Wende, JoAnn E Manson, et al. Adult stature and risk 37 of cancer at different anatomic sites in a cohort of postmenopausal women. 38 Cancer Epidemiology Biomarkers & Prevention, 22(8):1353–1363, 2013. 39 40 [26] Jane Green, Benjamin J Cairns, Delphine Casabonne, F Lucy Wright, 41 Gillian Reeves, Valerie Beral, Million Women Study collaborators, et al. 42 Height and cancer incidence in the million women study: prospective co- 43 hort, and meta-analysis of prospective studies of height and total cancer 44 risk. The lancet oncology, 12(8):785–794, 2011. 45 46 Figure captions 47 48 Figure 1 49 50 Relationships between the number of stem cells per tissue (s), the lifetime num- 51 ber of replications per stem cell in that tissue (d) and lifetime cancer risk, across A 52 31 cancer types (data from Table S1 in [9]). Relationship between stem cell 53 replication (d) and cancer risk, after statistically correcting for the effect of stem 54 cell number (s). This correction was done by regressing cancer risk on s, and then performing the partial regression of residual cancer risk on d. The r2 value 55 56 57 11 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Submitted to Phil. Trans. R. Soc. B - Issue Page 13 of 21

1 2 3 4 5 6 7 is the square of the partial regression coefficient and quantifies the amount of 8 variation in residual cancer risk explained by stem cell division. B Relationship 9 between stem cell number (s) and cancer risk, after statistically correcting for 10 the effect of stem cell replication (d). The partial r2 quantifies the variation 11 in residual cancer risk explained by stem cell replication. C Illustration of the 12 combined positive effects of stem cell number (s) and stem cell replication (d) on 13 predicted cancer risk. Predicted values were obtained from the multiple regres- 14 sion of cancer risk on d and s (see text). In 0.5 log-intervals we assigned a colour 15 gradient to the predicted values, ranging from light orange (low predicted risk) 16 to dark red (high predicted risk). Thus, cancer risk increases with increasing 17 values of both s and d. All analyses and figures use log-transformation of s, d 18 and cancerFor risk. The blackReview lines in A and B represent Only regression lines, and the 19 shaded areas the 95% confidence intervals around the regression. Colour-coding 20 based on Fig. 2 in [9], denoting deterministic D-tumours (blue) and replicative 21 R-tumours (green). 22 23 Figure 2 24 25 Residual lifetime risk of 31 cancer types, calculated as the difference between observed values and predictions of our multiple regression model (Figures 1A, 26 1B). Most of the cancers that T&V classed as deterministic D-tumours (blue 27 bars) also have high residual risks according to our alternative metric. Many 28 such cancers are associated with known causative factors (, chemical 29 carcinogens, or inherited cancer susceptibility genes). The additional identifi- 30 cation of cancers with very low residual lifetime risks (red bars) suggests that 31 some tissue types may be differentially resistant to tumours. 32 33 Figure 3 34 35 Relationship between cancer risk per stem cell division (risk / lscd) and lifetime 36 number of stem cell divisions (lscd) in 31 cancer types. The negative correlation 37 contradicts the hypothesis that the cancer risk per stem cell division is the same 38 for all tissues (in which case the line would be flat, with gradient 0). Dashed lines 2 −6 39 show 95% confidence intervals for the linear regression (R = 0.58, p< 1×10 ). 40 41 Figure 4 42 43 A hypothetical one-factor, non-linear model of the relationship between cancer 44 risk and lifetime number of stem cell divisions (lscd). If each stem cell division 45 has the same probability of causing cancer then there should be a linear rela- tionship between cancer risk and lscd with a gradient of 1. However, we argue 46 that the risk cannot rise indefinitely, but must be bounded by a maximum limit, 47 either due to the primacy of other causes of mortality and/or due to differential 48 cancer prevention in tissues with larger total numbers of stem cell divisions. 49 50 51 Figure 5 52 A Relationship between cancer risk and lifetime number of stem cell divisions 53 (lscd) in 34 cancer types, described by a model that assumes that the gradient 54 of the correlation is 1 for small lscd, and is 0 for large lscd. Data from T&V are 55 56 57 12 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Page 14 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

1 2 3 4 5 6 7 shown as crosses or hollow circles; additional data are shown as filled circles. 8 The model asymptotes are included as dashed lines. B The same model fitted 9 to the set of 27 cancer types not associated with a high-risk subpopulation (the 10 excluded data points are shown as hollow circles in A). Dotted lines indicate 11 the approximate 95% confidence interval of the regression curve. 12 13 Figure 6 14 15 A two-factor, linear model of the relationship between cancer risk and lifetime 16 number of stem cell divisions (lscd). In this case the cancer types are divided 17 into subsets according to tissue type. The subsetting partitions variation into 18 within-subsetFor variation Review (due to lscd) and between-subset Only variation (due to tissue 19 type). The gradient of the correlation within subsets is expected to be close 20 to 1. The dashed line indicates a hypothetical maximum risk threshold. For 21 each tissue type, the cancer risk per stem cell division can be estimated by 22 extrapolating the regression line to the point where lscd = 1 (i.e. log lscd = 0). 23 24 Figure 7 25 A Relationship between cancer risk and lifetime number of stem cell divisions 26 (lscd) in nine topographically-defined subsets of 24 cancer types (Table 1 in Sup- 27 plementary materials). The model assumes that the risk per stem cell division 28 may differ between subsets but that the slope of the correlation is the same for 29 each subset. Data from T&V are shown as crosses; additional data are shown as 30 filled circles. B Relationship between cancer risk and lifetime number of stem 31 cell divisions (lscd) in five morphologically-defined subsets of 17 cancer types 32 (Table 2 in Supplementary materials). C Cancer risk per stem cell division for 33 28 cancer types, calculated by dividing risk by lscd. This formula assumes that 34 the correlation between risk and lscd has a gradient of 1 for each tissue type, 35 which is supported by the results of the regression model (Equation 4). Cancer 36 types are coloured by topographic subset, according to the scheme shown in A. 37 Four types that belong to topographic subsets with only one member (and so 38 were excluded of the analysis shown in A) are shown in grey. 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 13 58 59 60 http://mc.manuscriptcentral.com/issue-ptrsb Submitted to Phil. Trans. R. Soc. B - Issue Page 15 of 21

1 2 3 4 5 6 7 8 9 10 11 12 13 For Review Only 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 http://mc.manuscriptcentral.com/issue-ptrsb 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 16 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

Colorectal adenocarcinoma with FAP 1 Lung adenocarcinoma (smokers) 2 Colorectal adenocarcinoma with Lynch syndrome 3 Head & neck squamous cell carcinoma with HPV−16 4 Thyroid papillary/follicular carcinoma 5 Gallbladder non papillary adenocarcinoma 6 Duodenum adenocarcinoma with FAP 7 For ReviewBasal cell carcinoma Only 8 Glioblastoma 9 Ovarian germ cell 10 Hepatocellular carcinoma with HCV 11 Osteosarcoma of the legs 12 Osteosarcoma 13 Head & neck squamous cell carcinoma 14 15 Testicular germ cell cancer 16 Thyroid medullary carcinoma 17 Colorectal adenocarcinoma 18 Esophageal squamous cell carcinoma 19 Lung adenocarcinoma (nonsmokers) 20 Osteosarcoma of the arms 21 Osteosarcoma of the pelvis 22 Melanoma 23 Pancreatic ductal adenocarcinoma (acinar) 24 Osteosarcoma of the head 25 Chronic lymphocytic leukemia 26 Hepatocellular carcinoma 27 Acute myeloid leukemia 28 29 Medulloblastoma 30 Duodenum adenocarcinoma 31 Pancreatic endocrine (islet cell) carcinoma 32 Small intestine adenocarcinoma 33 34 35 36 −1.5 −1.0 −0.5 0.0 0.5 1.0 37 http://mc.manuscriptcentral.com/issue-ptrsbresidual lifetime risk 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Submitted to Phil. Trans. R. Soc. B - Issue Page 17 of 21

1 For Review Only 2 3 −11 4 5 6 7 8 −12 9 10 11 12 13 −13 (risk per division)

14

15 10 16

17 log 18 −14 19 20 21 22 23 6 8 10 12 14 24 http://mc.manuscriptcentral.com/issue-ptrsb 25 log lscd 26 10 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 18 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

1 2 3 4 5 6 7 high$number$of$divisions:$ 8 9 satura3on$of$risk$(slope$=$0)$ 10 For Review Only 11 12 13 14 $ 15 16 risk 17 18 19 20 cancer$ 21 22 23 low$number$of$divisions:$ 24 25 risk$propor3onal$to$lcsd$ 26 27 (slope$=$1)$ 28 29 30 31 32 life3me$number$of$stem$cell$divisions$(lscd)$ 33 34 35 36 37 38 39 40 http://mc.manuscriptcentral.com/issue-ptrsb 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Submitted to Phil. Trans. R. Soc. B - Issue Page 19 of 21 ● A 0 Colorectal adenocarcinoma with FAP B 0 Colorectal adenocarcinoma with Lynch syndrome ● ● 1 Basal cell carcinoma 2 3 −1 Lung adenocarcinoma (smokers) ● ● ● −1 4 ● 5 For Review Only 6 7 −2 −2 8 9 ● ●

10 incidence incidence ● ● 11 10 −3 10 −3 12 g g o o 13 l l 14 −4 ● −4 ● 15 16 17 18 −5 −5 19 20 21 22 6 7 8 9 10 11 12 13 6 7 8 9 10 11 12 13 23 http://mc.manuscriptcentral.com/issue-ptrsb 24 log10 lscd log10 lscd 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 20 of 21 Submitted to Phil. Trans. R. Soc. B - Issue

1 2 3 4 5 %ssue' %ssue' 6

7 ' type'1' type'2' 8

9 risk 10 varia%on'due' 11 For Review Only 12 to'%ssue'type' 13 14 cancer' 15 varia%on' 16 17 due'to'lscd' 18 19 20 21 22r1' 23 24 ri ='cancer'risk'per' 25 26 stem'cell'division' 27 for'%ssue'type i 28 29 30r2' 31 32 33 34 1' 35 life%me'number'of'stem'cell'divisions'(lscd)' 36 37 38 39 40 http://mc.manuscriptcentral.com/issue-ptrsb 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 A Submitted to Phil. Trans. R.B Soc. B - Issue Page 21 of 21

0 Thyroid Skin 0 Squamous cell 1 Bone Blood Adenocarcinoma 2 Brain Pancreas Germ cell 3 −1 −1 4 Lung Intestine Carcinoid 5 Gullet, mouth, etc. Osteocarcinoma 6 7 −2 −2 8 9 ● ●

10 incidence incidence −3 ● −3 ● 11 10 10

12g g o o 13l ● For Reviewl Only 14 −4 −4 15 16 17 18 −5 −5 19 20 6 7 8 9 10 11 12 13 6 7 8 9 10 11 12 13 21 22 log10 lscd log10 lscd 23 24C 25 Gallbladder non papillary adenocarcinoma 26 Osteosarcoma of the legs 27 Ovarian germ cell 28 Thyroid papillary/follicular carcinoma Osteosarcoma 29 Osteosarcoma of the pelvis 30 Osteosarcoma of the arms 31 Glioblastoma 32 Thyroid medullary carcinoma Osteosarcoma of the head 33 Lung small−cell carcinoma (non−smokers) 34 Esophageal squamous cell carcinoma 35 Testicular germ cell cancer 36 Lung adenocarcinoma (nonsmokers) Head & neck squamous cell carcinoma 37 Medulloblastoma 38 Basal cell carcinoma 39 Colorectal neuroendocrine (carcinoid) 40 Colorectal adenocarcinoma Chronic lymphocytic leukemia 41 Pancreatic ductal adenocarcinoma (acinar) 42 Duodenum adenocarcinoma 43 Pancreatic endocrine (islet cell) carcinoma 44 Acute myeloid leukemia Melanoma 45 Hepatocellular carcinoma 46 Small intestine neuroendocrine (carcinoid) 47 Small intestine adenocarcinoma 48 49 http://mc.manuscriptcentral.com/issue-ptrsb 50 −15 −14 −13 −12 −11 51 log10 cancer risk per stem cell division 52 53 54 55 56 57 58 59 60