Le ciblage de B-RAF
Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns and Shape the Cancer Genome Cell
Mutational landscape and significance across12 major cancer types Nature
Targeting RAF kinases for cancer therapy: BRAF-mutated melanoma and beyond Nature Reviews Cancer
Réalisation : Bibliothèque médicale - Gustave Roussy Resource
Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns and Shape the Cancer Genome
Teresa Davoli,1,2,5 Andrew Wei Xu,2,4,5 Kristen E. Mengwasser,1,2 Laura M. Sack,1,2 John C. Yoon,2,3 Peter J. Park,2,4 and Stephen J. Elledge1,2,* 1Howard Hughes Medical Institute, Department of Genetics, Harvard Medical School, Boston, MA 02115, USA 2Division of Genetics, Brigham and Women’s Hospital, Boston, MA 02115, USA 3Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA 4Center for Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA 5These authors contributed equally to this work *Correspondence: [email protected] http://dx.doi.org/10.1016/j.cell.2013.10.011
SUMMARY such as translocations and chromothripsis. Understanding how these events drive tumorigenesis is a major unmet need in Aneuploidy has been recognized as a hallmark of cancer research. cancer for more than 100 years, yet no general the- Though ostensibly random, these alterations follow a ory to explain the recurring patterns of aneuploidy nonrandom pattern that suggests that they are under selection in cancer has emerged. Here, we develop Tumor and are likely to be cancer drivers rather than passengers. Suppressor and Oncogene (TUSON) Explorer, a If so, we should be able to explain how they drive tumori- computational method that analyzes the patterns genesis. A recent clue as to how this might work came from the integration of a genome-wide RNAi proliferation screen of mutational signatures in tumors and predicts with focal SCNA information (Solimini et al., 2012). The screen the likelihood that any individual gene functions identified STOP and GO genes that are negative and positive as a tumor suppressor (TSG) or oncogene (OG). regulators of cell proliferation, respectively. Hemizygous By analyzing >8,200 tumor-normal pairs, we pro- recurring focal deletions were enriched for STOP genes vide statistical evidence suggesting that many and depleted of GO genes, suggesting that the deletions more genes possess cancer driver properties than maximize their protumorigenic phenotype through cumula- anticipated, forming a continuum of oncogenic tive haploinsufficiency of STOP and GO genes. Haploin- potential. Integrating our driver predictions with sufficiency describes a genetic relationship in a diploid information on somatic copy number alterations, organism in which loss of one copy of a gene causes a pheno- we find that the distribution and potency of TSGs type. The converse is triplosensitivity, in which an additional (STOP genes), OGs, and essential genes (GO copy of a gene produces a phenotype. However, the distribu- tions of STOP and GO genes were not able to predict genes) on chromosomes can predict the complex aneuploidy or chromosome arm SCNA frequencies, perhaps patterns of aneuploidy and copy number varia- because they represent only one aspect of tumorigenesis tion characteristic of cancer genomes. We pro- (proliferation) or are too diluted by nonhaploinsufficient genes. pose that the cancer genome is shaped through We hypothesized that the drivers of sporadic tumorigenesis a process of cumulative haploinsufficiency and might provide a more representative and potent set of STOP triplosensitivity. and GO genes with which to explore this phenomenon. Furthermore, this gene set may possess a higher frequency INTRODUCTION of haploinsufficiency. In this study, we developed methods to identify tumor sup- A key goal of cancer research is to identify genes whose muta- pressor genes (TGSs) and oncogenes (OGs) from tumor DNA tion promotes the oncogenic state. Research over the last 40 sequences. We implicate many new drivers in cancer causation years has identified numerous potent drivers of the cancer and find many more cancer drivers than expected that exist in phenotype (Meyerson et al., 2010; Stratton et al., 2009; Vogel- a continuum of decreasing phenotypic potential. Furthermore, stein et al., 2013). Perhaps the most striking characteristics of we found that the distribution and potency of TSGs, OGs, and cancer genomes are their frequent somatic copy number alter- essential genes on chromosomes can explain copy number ations (SCNAs) and extensive aneuploidies. Deletions and ampli- alterations of whole chromosomes and chromosome arms fications of whole chromosomes, chromosome arms, or focal during cancer evolution through a process of cumulative hap- regions are rampant in cancer, as are other rearrangements loinsufficiency and triplosensitivity.
948 Cell 155, 948–962, November 7, 2013 ª2013 Elsevier Inc. ACMethod for the prediction Figure 1. Prediction of TSGs and OGs of TSGs and OGs Based on Their Mutational Profile (A) Schematic representation of our method for the Tumor Classification of mutations Samples based on their functional impact detection of cancer driver genes based on the assessment of the overall mutational profile of
Selection of traning sets each gene. The somatic mutations in each gene for TSGs,OGs and Neutral genes from all tumor samples are combined and classi- Analysis fied based on their predicted functional impact. Definition of 22 potentially The main classes of mutations (silent, missense, relevant parameters Analysis of Mutation Patterns and LOF) are depicted. for each gene (B) Schematic depicting the most important LOF Identification of the best features of the distributions of mutation types parameters through the LASSO Functional Prediction expected for a typical TSG, OG, and neutral Impact Lasso classification method Missense Prediction of TSGs and OGs gene. Compared to ‘‘neutral’’ genes, TSGs are Silent based on Lasso expected to display a higher number of in- Gene TUSON Explorer Prediction activating mutations relative to their background
Patterns of mutations expected to be Prediction of TSGs mutation rate (benign mutations), and OGs are B predictive of TSGs and OGs and OGs based on the combined significance expected to display a higher number of acti- LOF of the best parameters vating missense mutations and a characteristic Pattern Expected Missense Silent pattern of recurrent missense mutations in spe-
Neutral gene cific residues. D TUSON Explorer LOF (C) A flowchart delineating the main steps in our High ratio of damaging Missense method for identifying TSGs and OGs, from the versus benign mutations Silent Parameters for the Parameters for the classification of the mutations based on their Tumor Suppressor Gene prediction of TSGs prediction of OGs functional impact to the identification of the best LOF 1. LOF/Benign ratio 1. Entropy Score parameters through Lasso and their use for the High level of missense mutations 2. HiFi Missense/Benign ratio 2. HiFi Missense/Benign ratio and clustering of mutations Missense 3. Splicing/Benign ratio prediction of TSGs and OGs by TUSON Explorer Silent (or the Lasso method). Oncogene (D) Schematic related to (C) depicting the pa- rameters selected by Lasso employed by TUSON Explorer for the prediction of TSGs and OGs (HiFI, high functional impact). For TSGs, the parameters are the LOF/Benign ratio, the HiFI/Benign ratio, and the Splicing/Benign ratio, whereas for OGs the parameters are the Entropy score and the HiFI/Benign ratio. Also see Figure S1 and Table S1.
RESULTS eters, and this rate can be estimated by the number of mutations that are unlikely to affect its function (such as silent or func- Cancer driver genes have been described as mountains and hills tionally benign mutations), whose observed frequency is not (Wood et al., 2007). Mountains are driver genes that are very dependent on selective pressure during cancer evolution. The frequently mutated in cancer, whereas hills represent less proportion of functionally relevant mutations of particular classes frequently mutated driver genes. It has become clear from recent compared to this background mutation rate will be dependent on international sequencing efforts that most potent drivers (moun- the degree of selection and will predict the likelihood that a gene tains) have been discovered. A key issue is how to determine the will act as a cancer driver. TSGs and OGs can be distinguished identity of the significant but less frequently mutated drivers, the among the cancer driver genes based on the characteristic hills. A recent analysis searching for very high confidence cancer pattern of the different types of mutations (i.e., loss of function drivers in a database of 400,000 mutations estimated that there [LOF], missense, silent) that are typically observed for those were 71 TSGs and 54 OGs (Vogelstein et al., 2013). It is likely that two classes of drivers relative to neutral genes, as illustrated in there also exist additional functionally significant cancer drivers Figure 1B. with weaker phenotypes and lower probabilities that are selected less frequently. A central question is how to identify Identification of Parameters Predicting TSGs and OGs these genes. In principle, with more samples analyzed, greater We set out to determine the most reliable parameters for the statistical significance can be placed on the outliers, allowing prediction of TSGs and OGs in an unbiased way (Figure 1C). discovery of lower penetrance drivers. However, it is likely that We used sequence data from >8,200 tumors from the COSMIC there is more information present in the current data that may (Forbes et al., 2010) and TCGA (http://cancergenome.nih.gov/) allow these lower frequency events to be detected. databases and a recently published database (Alexandrov To approach this question, we sought to devise a method to et al., 2013) comprising >1,000,000 mutations (Figure S1 and predict TSGs and OGs in cancer based on the properties of Table S1 available online). We defined a list of 22 parameters pri- gene mutation signatures of these two distinct classes of driver marily based on the different classes of mutations and used the genes. We hypothesized that the proportion of the different types classification method Lasso and three training sets of known of mutations with different functional impact would be informa- TSGs and OGs (from the Cancer Gene Census, Futreal et al., tive in predicting these two types of drivers (Figure 1A). Each 2004)(Table S2A) and neutral genes to identify those parameters gene has a background mutation rate that is dependent on tran- that best predict the two classes of driver genes (see Experi- scription, replication timing, and possibly other unknown param- mental Procedures). We employed PolyPhen2 to predict the
Cell 155, 948–962, November 7, 2013 ª2013 Elsevier Inc. 949 functional impact of missense mutations in order to classify them mutations (see Experimental Procedures). Finally, we used an into those with potentially high (HiFI) or low (LoFI) functional extension of Liptak’s method to provide a combined p value impact (Adzhubei et al., 2010). LoFI mutations are typically for the selected parameters for each gene. For TSGs, the com- conservative amino acid changes or changes in poorly bined p values (and q values) were derived from individual values conserved residues. We defined the combination of silent and from the LOF/Benign, Splicing/Benign, and HiFI/Benign param- LoFI missense as ‘‘Benign’’ mutations to provide a larger, more eters. For OGs, the combined values were derived from the reliable value for estimating background mutation rates. We Missense Entropy and the HiFI/Benign parameters (Figure 1D). also defined the LOF mutations as the combination of nonsense The LOF/Benign parameter for discrimination between TSGs and frameshift mutations. As a majority of known OGs show an and OGs was subsequently utilized to define a final list of OGs atypical distribution of recurrent mutations in one or a few key and TSGs (see Experimental Procedures). TUSON Explorer residues, we utilized entropy, a well-defined concept in physics does not take into account SCNA information, and this allows and information theory (Shannon and Weaver, 1949), to measure us to perform a rigorous analysis of our cancer driver genes for the degree of reoccurring mutations within a gene. The Entropy their abilities to predict the frequency of deletion and amplifica- score represents the weighted sum of the probabilities, across tion (see below). a gene, that a site is mutated (see Experimental Procedures). As a second strategy to predict the probability of a given gene The best parameters found by Lasso for the prediction of being a TSG or OG, we employed the Lasso model, which also TSGs and OGs are described below and are visualized in Figures takes into account SCNAs (see Experimental Procedures). The 1D and 2. ranked lists of predicted TSGs and OGs by both Lasso and Tumor Suppressors versus Neutral Genes TUSON Explorer are contained in Tables S3A and S3B. This The most predictive parameters for TSGs are: (1) the ratio of LOF list provides a facile look-up table that can be easily sorted for mutations to Benign (p = 2.51 3 10À28, Wilcoxon, one-tailed different parameters for all those who are interested in the muta- test); (2) the ratio of Splicing to Benign mutations (p = 4.6 3 tional behavior of a given gene in this data set. 10À13); (3) the ratio of HiFI missense to Benign mutations (p = Both ranking strategies performed similarly and eliminated the 3.2 3 10À14); and (4) high-level deletion frequency (p = 1.46 3 problems of inappropriately including giant genes and genes in 10À8). A 20-fold cross-validation shows a high prediction accu- highly mutable regions (Dees et al., 2012), without the need to racy of 93.2% on these training sets (Figure 2A and Table S2B). consider expression level or replication timing (Lawrence et al., Oncogenes versus Neutral Genes 2013). Importantly, both of our strategies distinguish between The most predictive parameters for OGs are: (1) the entropy for TSGs and OGs, which are predicted to have functionally oppo- missense mutations (p = 2.2 3 10À14); (2) the ratio of HiFI site roles in the control of cell growth and have different implica- missense mutations to Benign mutations (p = 1.2 3 10À9); and tions for potential cancer therapeutics. (3) high-level amplification frequency (p = 1.4 3 10À6). The 20-fold cross-validation accuracy is 85.2% (Figure 2B and Estimates of the Numbers of TSGs and OGs Table S2B). A ranking of this nature consists of truly significant genes mixed Tumor Suppressor Genes versus Oncogenes with false-positive genes that obtain low p values by chance un- One important aim of our prediction method is the discrimination der the null hypothesis. Thus, we sought to get an estimate of the between TSGs and OGs. The most predictive parameters be- minimum number of TSGs and OGs by analyzing the distribution tween these two sets are: (1) the ratio of LOF to Benign mutations of the combined p values for each class of cancer driver genes. (p = 2.5 3 10À16); (2) high-level amplification frequency (p = To achieve this, we utilized a histogram-based method (Mosig 1.3 3 10À9); (3) high-level deletion frequency (p = 7.6 3 10À6); et al., 2001) to estimate the number of rejected hypotheses and (4) the ratio of Splicing to Benign mutations (p = 9.9 3 from the distributions of the combined p values calculated for 10À7). The 20-fold cross-validation accuracy is 91.9%. Overall, each gene. With our data set, this method estimated 320 Lasso identified parameters that make intuitive sense for these TSGs and 250 OGs (Extended Experimental Procedures). classes of genes and clearly delineated TSGs and OGs from This long list of TSGs and OGs suggests that there are many each other and from neutral genes. In sum, we identified more drivers than anticipated and that they exist in a continuum independent parameters that strongly predict and distinguish of decreasing potency (Discussion and Figure 7A). For the between TSGs and OGs (Figures 2C and 2D and Table S2B). analyses described below, we considered the top 300 TSGs (q value < 0.18) and 250 OGs (q value < 0.22) as our working lists. Identifying OGs and TSGs Given the fact that the deviation of the mutation signatures from Having identified the most predictive parameters, we developed the normal pattern is a function of the degree of selection and the a method we call Tumor Suppressor and Oncogene Explorer frequency of mutation, increasing the number of tumor samples (TUSON Explorer) that combined selected parameters to derive will detect even more cancer drivers of progressively weaker an overall significance and ranking for each gene as a potential selective pressure. To determine the potential number of TSGs TSG or OG (Figure 1D). First, we derived a p value for each upon additional sequencing, we applied TUSON and estimated gene for the ratios of LOF/Benign, Splicing/Benign, HiFI/Benign, the number of TSGs (using Mosig’s method) on random subsets and Missense Entropy based on the comparison to the neutral of the data set with increasing numbers of samples and gene set (see Experimental Procedures). For the LOF/Benign observed that the number of predicted TSGs continues to parameter, we applied a correction to normalize for the nonuni- increase with additional samples. We observed that the rate of form codon usage among genes for the occurrence of nonsense increase in the predicted number of TSGs decreases slightly at
950 Cell 155, 948–962, November 7, 2013 ª2013 Elsevier Inc. TSG versus Neutral Genes OG versus TSG ACLOF HiFI Splicing Deletion LOF Splicing Deletion Amplification Benign Benign Benign Frequency Benign Benign Frequency Frequency p=3e-28 p=3e-14 p=5e-13 p=1e-8 p=3e-16 p=1e-8 p=8e-6 p=1e-9 0.020 3.00 0.06 3.00 2.50 0.5 0.007 2.75 2.75 2.25 0.4 2.50 2.50 0.05 0.006 2.00 0.4 2.25 0.015 2.25 1.75 2.00 0.3 0.005 0.04 2.00 1.50 0.3 1.75 1.75 0.004 1.50 1.25 0.010 1.50 0.03 0.2 1.25 1.25 LOF/Benign LOF/Benign 1.00 0.2 0.003 Deletion Freq Deletion Freq Splicing/Benign Splicing/Benign HFI Miss/Benign Amplification Freq 1.00 1.00 0.02 0.75 0.002 0.005 0.75 0.75 0.1 0.50 0.1 0.50 0.50 0.01 0.001 0.25 0.25 0.25
0.00 0.00 0.0 0.000 0.00 0.0 0.000 0.00 Neut TSG Neut TSG Neut TSG Neut TSG D OG TSG OG TSG OG TSG OG TSG BRAF IDH1 -10 -5 -5 -10 2.00 <10 10 110<10 OG versus Neutral Genes KRAS B PIK3CA Missense HiFI Mean Missense Amplification Entropy Benign Polyphen2 Score Frequency NRAS pvalue OG pvalue TSG p=2e-14 p=1e-9 p=3e-10 p=1e-6 AKT1 1.00 0.40 2.00 0.06 IDH2 CTNNB1 0.80 0.35 1.75 RAC1 PTEN EZH2 0.75 0.05 TP53 U2AF1 0.30 1.50 0.60 EGFR 0.04 0.50 0.25 1.25 FBXW7 0.40 0.50 SMAD4 0.20 1.00 0.03 0.30 Missense Entropy B2M 0.15 0.75 HFI Miss/Benign
Missense Entropy 0.20 Amplification Freq 0.02 CDKN2A Mean Polyphen2 Score Mean Polyphen2 0.25 0.10 0.50 VHL