Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508 Published Online First on February 2, 2010 as 10.1158/1535-7163.MCT-09-0508

Research Article Molecular Cancer Therapeutics Molecular Evolutionary Analysis of Cancer Cell Lines

Yan Zhang1, Michael J. Italia2, Kurt R. Auger4, Wendy S. Halsey3, Stephanie F. Van Horn3, Ganesh M. Sathe3, Michal Magid-Slav2, James R. Brown2, and Joanna D. Holbrook4

Abstract With -wide cancer studies producing large DNA sequence data sets, novel computational approaches toward better understanding the role of mutations in tumor survival and proliferation are greatly needed. Tumors are widely viewed to be influenced by Darwinian processes, yet molecular evolutionary analysis, invaluable in other DNA sequence studies, has seen little application in cancer biology. Here, we describe the phylogenetic analysis of 353 cancer cell lines based on multiple sequence alignments of 3,252 nucleotides and 1,170 amino acids built from the concatenation of variant codons and residues across 494 and 523 genes, respectively. Reconstructed phylogenetic trees cluster cell lines by shared DNA variant patterns rather than cancer tissue type, suggesting that tumors originating from diverse histologies have similar oncogenic pathways. A well-supported clade of 91 cancer cell lines representing multiple tumor types also had significantly different gene expression profiles from the remaining cell lines according to statistical analyses of mRNA microarray data. This suggests that phylogenetic clustering of tumor cell lines based on DNA variants might reflect functional similarities in cellular pathways. Positive selection analysis revealed specific DNA variants that might be potential driver mutations. Our study shows the potential role of molecular evolutionary analyses in tumor classification and the development of novel anticancer strategies. Mol Cancer Ther; 9(2); 279–91. ©2010 AACR.

Introduction data sets present formidable bioinformatic challenges, creating a critical need for the development of new ana- The advent of lower-cost, higher-throughput DNA se- lytic approaches to help understand the dependence of quencing technologies is ushering in a new era of cancer the cancer cell on mutations and the relationships among or oncogenomics (1–4). Recent large-scale mu- different tumor types. tational analyses of cancer cell lines and clinical samples One significant objective of molecular cancer genomic have revealed complex and highly heterogeneous varia- studies is to attempt to distinguish driver mutations re- tion even among tumors derived from the same tissue sponsible for tumor proliferation and survival from co- type (5). Such findings have important implications for incidental passenger mutations resulting from relaxed the clinical treatment of cancer patients. For example, a DNA repair and replication fidelity (5). Another fre- large-scale sequencing project of cDNAs from 90 tyrosine quent study goal is the classification of tumors based kinase genes across 254 cell lines found that colon cancer on various molecular data, including DNA variants, cell lines had similar mutations in the epidermal growth transcriptional profiling, epigenetic signatures, or combi- factor receptor kinase domain that rendered non–small nations thereof. Interestingly, although tumorigenesis cell lung carcinoma patients responsive to the inhibitor has been viewed as an evolutionary process influenced gefitinib, thus suggesting another potential indication by natural selection processes (7), there has been little for this anticancer drug (6). Although the cost of genera- application of molecular evolutionary approaches in tion of DNA sequence data has greatly decreased, these the analysis of cancer mutations and understanding the relationships of diverse tumors at the DNA sequence level. Authors' Affiliations: 1Department of Biology, Pennsylvania State Here, we describe the evolutionary relationships of University, University Park, Pennsylvania and 2Computational Biology, several hundred cancer cell lines and characterize poten- 3Molecular and Cellular Technology, and 4Oncology, Research and tial mutational patterns using both phylogenetic and Development, GlaxoSmithKline, Collegeville, Pennsylvania positive selection analytic approaches. We owe much of Note: Supplementary material for this article is available at Molecular Cancer Therapeutics Online (http://mct.aacrjournals.org/). our current understanding of cancer biology to the study of established cell lines, and it is expected that further Current address for M.J. Italia: Center for Biomedical Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104. functional validation of discoveries from the oncogen- Corresponding Author: James R. Brown, GlaxoSmithKline, 1250 South ome will rely on many of the same cell lines. For exam- Collegeville Road, Mail Code UP1345, Collegeville, PA 19426. Phone: ple, the NCI-60 panel of cell lines has been extensively 610-917-6374; Fax: 610-917-7901. E-mail: [email protected] characterized at both the phenotypic and the genotypic doi: 10.1158/1535-7163.MCT-09-0508 level(8,9).Recently,Lorenzietal.(10)reportedon ©2010 American Association for Cancer Research. DNA fingerprinting of the NCI-60 cell line panel, which

www.aacrjournals.org 279

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Zhang et al.

Figure 1. Flowchart of cancer cell line gene variant analysis. In-house sequencing project determined a total of 2,777 variant nucleotides (compared with GenBank RefSeq) in 55 genes across 353 human tumor cell lines. Additional mutations for those cell lines were obtained from two public data sources: COSMIC (nucleotides; ref. 15) and Tykiva (amino acids only; ref. 6). A single multiple sequence alignment was constructed from concatenated mutated codons that differed from RefSeq. An amino acid multiple sequence alignment was also constructed from translated codons and additional amino acid mutations identified in the Tykiva database. Both nucleotide and amino acid sequence alignments were used for subsequent phylogenetic and positive selection analysis (see Materials and Methods).

suggested that a few cell lines might have common activating or inhibiting enzymatic activity of specific donor origins. However, phylogenetic classification proteins, thereby causing clinical resistance to new based on DNA mutational patterns of any collection of cancer drugs targeting specific proteins such as kinases cell lines has not been previously reported. (11, 12). Because for any particular tumor gene only The basis for any molecular evolutionary analysis is a one or a few DNA variants may occur, we constructed consistent and biologically relevant multiple sequence concatenated multiple sequence alignments composed alignment of nucleotide or protein sequences from the of DNA variant codons from several hundred genes relevant taxa. Although larger-scale genome-level rear- collected across hundreds of tumor cell lines. Phyloge- rangements are often associated with tumorigenesis, netic analysis of concatenated data sets has been we focused on single nucleotide synonymous and non- used previously to study other complex evolutionary synonymous substitutions or point mutations as the questions, such as the origin of eukaryotes (13) and most tractable genetic variant for phylogenetic analyses. the universal tree of life (14). To our knowledge, this Point mutations are highly important in the modulation is the first application of rigorous phylogenetic analyses of many different tumorigenic pathways, by either to cancer mutation data sets.

280 Mol Cancer Ther; 9(2) February 2010 Molecular Cancer Therapeutics

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Cancer Cell Line Evolution

Materials and Methods mutations across 452 genes. The Tykiva Database (6) of the Institute of Singapore6 provides amino DNA Sequencing acid sequences of cancer cell line mutations (DNA Our core data set for phylogenetic analyses was derived sequences are unavailable). A total of 133 amino acid from novel DNA sequencing of exons from 55 selected on- variants in 59 genes across 60 common cell lines were cogenes, or genes otherwise implicated in tumor prolifer- obtained from the Tykiva Database. ation, from 353 cancer cell lines. Supplementary Table S1 Because each of the databases used different wild-type lists the cell lines and their tissue of origin along with the human reference sequences to determine tumor cell line genes sequenced in this study. All cell lines were obtained variants (called here dbRef), we adopted a protocol to from either the American Type Culture Collection or other standardize variants against a common wild-type human public repositories. Both cDNAs and genomic DNA were reference. As our standard wild-type or wtRef for new sequenced. Total RNA from cancer cell lines was isolated DNA sequences generated in this study as well as se- using a modified RNeasy kit (Qiagen, Inc.) and converted quences imported from cancer mutation databases, we into cDNA using the First-Strand cDNA Synthesis kit us- used human gene sequences from the NCBI Reference ing oligo(dT) primers (Roche Diagnostics). Genomic Sequence collection (August 2008). From the COSMIC DNA was prepared using the Maxwell-16 DNA Purifica- database, we extracted information of the exact position tion kit (Promega). For genomic DNA, all exons were PCR of each reported point mutation, its corresponding dbRef amplified by designing oligonucleotide primers within type nucleotide, and its position in wtRef sequences. To flanking intronic regions. Approximately 2 kb of 5′- ensure the correctness of the data extraction, we compared untranslated region and 1 kb of 3′-untranslated region the extracted dbRef nucleotide with the corresponding areas were also covered by sequencing. PCR primers were wtRef nucleotide. Furthermore, we confirmed that the tailed with M13 universal forward and reverse sequencing extracted three-nucleotide codons were identical in both primer sequences. All primers have been tested on Human wtRef and dbRef sequences. Any differences were investi- Genomic DNA (Promega) and BD qPCR Human Reference gated and rejected if unresolved. If the wtRef and dbRef cDNA, random-primed (BD Biosciences), respectively, codons were identical, then the variant codon or amino before using on cell line samples. PCRs were carried out acid recorded in that database was retained. For the inclu- using HotStarTaq DNA polymerase (Qiagen). DNA was sion of a codon in our multiple sequence alignments, at amplified for 35 cycles at 95°C for 20 s, 55°C for 30 s, and least one variant had to be observed in one of the cell lines. 72°C for 45 s, followed by 7-min extension at 72°C. For heterozygous genes, we used the variant allele to in- Before DNA sequencing, all PCR products were purified crease phylogenetic signal. using AmPure (Agencourt Bioscience Corp.). Direct To build the concatenated sequence alignments, we sequencing of purified PCR products was done with added the variant codons in sequential order such that BigDye Terminator Cycle Sequencing kit (v1.1; Applied each aligned nucleotide position was homologous across Biosystems) followed by purification using CleanSeq all cell lines. For those genes with missing data, the wtRef (Agencourt Bioscience). Sequencing was done using an Ap- codon was used to build a complete multiple sequence plied Biosystems Genetic Analyzer 3730XL. All sequence alignment without gaps. As the outgroup, a complete data were assembled and analyzed using Aligner software wtRef codon sequence was added to the cell line multiple (CodonCode Corp.), and sequence variants were confirmed sequence alignment. This produced a multiple sequence by independent PCR amplifications and sequencing. alignment of 3,252 nucleotides for 353 cell lines plus nu- cleotide wtRef. Before phylogenetic analysis, cell lines Construction of Multiple Sequence Alignments with 100% identity in nucleotide sequence were reduced Figure 1 shows a flow diagram of the DNA sequence to a single representative (see Results and Discussion for analysis pipeline. For the cell lines we sequenced, com- a list of identical cell lines). Thus, phylogenetic analyses parisons to the National Center for Biotechnology Infor- were done on the distinct sequences from 321 cell lines, mation (NCBI) Human RefSeq (August 2008) revealed including the wtRef sequence. 2,777 different nucleotide variations. No distinction was A similar approach was used to construct the amino made between germline single nucleotide polymorph- acid sequence alignment with some additional steps. isms (SNP) or somatic tissue tumor-specific mutations “ ” First, all variant codon positions from our DNA sequence (herein collectively called variants ). To augment our data as well as that of the COSMIC database were trans- sample with additional gene sequences, overlapping cell lated to the corresponding amino acid. An additional 133 line collections were identified in two public repositories amino acid variants found in 59 genes were added from of cancer mutations. Of the 353 cell lines sequenced, 229 the Tykiva resource. The wtRef nucleotide sequence was were also found to have nucleotide-level mutations re- also translated into amino acids. After consolidating cell corded in the database COSMIC (15), a comprehensive lines with identical amino acid sequences, the final pro- source of cancer mutations maintained by the Sanger tein multiple sequence alignment was 1,170 amino acids Centre,5 from which we extracted an additional 922

5 http://www.sanger.ac.uk/genetics/CGP/cosmic/ 6 http://tykiva.bii.a-star.edu.sg/SOGdb/cgi-bin/sogweb.pl

www.aacrjournals.org Mol Cancer Ther; 9(2) February 2010 281

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Zhang et al.

for 292 cell lines, including an amino acid sequence ver- been widely used for biomarker discovery and biological sion of wtRef. process elucidation, particularly for cancer transcrip- tomic studies (25–28). PLS-DA analysis was done using Phylogenetic Analysis SIMCA-P+ 11.5 (Umetrics). Cross-validation was done For the nucleotide sequence alignments, maximum according to default software settings, except for increas- likelihood (ML) tree topologies were constructed using ing the number of permutations from 7 to 100. A total of the software GARLI v0.96 (16, 17). Estimation of rate het- 398 probe sets (PLS-DA gene set) were selected based on erogeneity was done with the γ distribution model. Sixty their high variable importance for the projection (VIP) va- initial runs of GARLI were carried out to ensure similar lues. As a cross-check of PLS-DA results, the data were likelihood scores were reached. Subsequently, another 60 also analyzed using Student's t test. The P value was cor- runs were made to evaluate the consistency of tree topol- rected by Benjamini-Hochberg procedure with a false dis- ogies with improved parameters to increase the intensity covery rate of <0.05 as the significance threshold (29). of the searches (attachmentspertaxon = 200, genthresh- Pathway enrichment analyses were conducted with Me- fortopoterm = 80,000, numberofprecreductions = 40, oth- taCore (GeneGo, Inc.; ref. 30). er parameters default). Nucleotide sequence phylogenies were also reconstructed using Bayesian posterior proba- Selection Pressure Analysis bilities (BP) as implemented by the software MrBayes The branch-site model (31, 32) implemented in the CO- v3.0B4 (18, 19). Bayesian analysis also used the γ-distrib- DEML program from the PAML package (33, 34) was used uted rate model with six discrete rate categories. Markov to test for positive selection. We tested each of the branches chains were run for 106 generations, burn-in values were on the cell line phylogeny, treating each in turn as the fore- set for 104 generations, and trees were sampled every 100 ground branch, with all the other branches specified as generations. background branches. Likelihood ratio tests were done We constructed amino acid–based phylogenetic trees with the Bonferroni correction for multiple testing (35). using BP and distance neighbor-joining methods. BP The alternative branch-site model has four codon site cate- was done as described for nucleotide alignments but gories: the first two for sites evolving under purifying se- with the mixed model for the amino acid rate matrix. lection and neutral selection on all the lineages and the Neighbor-joining trees were based on pairwise distances additional two categories for sites under positive selection between amino acid sequences using the programs on the foreground branch. The null model restricts sites on NEIGHBOR and PROTDIST (Dayhoff option) of the the foreground lineage to be undergoing neutral evolution. PHYLIP 3.6 package.7 All reconstructed phylogenetic Each branch-site model was run thrice. At least two of three trees were visualized using the programs TreeView replicate runs of each model should converge at or within v1.6.6 (20) and v2.2.2 (21). 0.001 of the same log-likelihood value. Runs that did not converge indicated problems with the data and were rerun mRNA Expression and Pathway Analysis until convergence was obtained or else reported as a con- To determine the relative differences in mRNA expres- vergence problem. sion between clade A (see Results and Discussion) and other cell lines, we analyzed mRNA microarray profiles Results and Discussion previously generated for these cell lines by GlaxoSmithK- line, which are available from the public repository caBIG We constructed an initial data set from an in-house (22). Transcript profiling data as well as background in- DNA sequencing effort of 55 known oncogenes and tumor formation for each cell line microarray experiment sam- suppressor genes from 353 cancer cell lines (Fig. 1). The se- ple are available online.8 Because large numbers of lected cell lines represent a significant proportion of tumor peripheral blood cell lines fall outside of clade A, inclu- cell lines commonly used in cancer research, including the sion of peripheral blood cell lines could bias results to- National Cancer Institute collection. Multiple tissue ward genes that simply differentiate solid tumor from sources were represented in this cancer cell line sample, peripheral blood cancer cell lines. Thus, we focused our with the top four tumor types being breast (n = 32), colon gene expression analysis exclusively on solid tumor cell (n = 26), lung (n = 38), and lymphomas (n = 88; classified as lines. For the statistical analysis of gene expression data, peripheral blood in Supplementary Table S1). we used partial least squares discriminant analysis (PLS- Among the cell lines we sequenced, comparisons to DA; ref. 23) for class comparison (24) to identify genes Human RefSeq (retrieved from NCBI human genome with the most discriminating probe sets between clade build 36.3, August 2008) revealed 2,777 different nucleo- Aandnon–clade A solid tumor cell lines. PLS-DA has tide variations that were either germline SNPs or somatic tissue tumor-specific mutations, herein collectively called variants. To augment our sample with further gene se- quences, overlapping cell line collections were identified 7 J. Felsentein. PHYLIP (Phylogenetic Inference Package) [3.6]. Seattle in two public repositories of cancer mutations. The (WA): Department of , University of Washington; 2000, unpub- lished work. Sanger Centre COSMIC (15) resource, a comprehensive 8 https://cabig.nci.nih.gov/caArray_GSKdata/ database of cancer mutations, had 229 of the 353 cell lines

282 Mol Cancer Ther; 9(2) February 2010 Molecular Cancer Therapeutics

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Cancer Cell Line Evolution

Figure 2. Phylogenetic tree of tumor cell lines based on DNA sequences. Best ML (ML value = −ln19311.744) tree of 320 unique tumor cell lines outgroup rooted with the human RefSeq (wtRef). Based on a multiple sequence alignment of 3,252 nucleotides derived from the concatenation of variant codons, the tree was reconstructed using the GARLI package (see Materials and Methods; ref. 16). Cell lines are color coded by tumor tissue type as listed, along with legend abbreviations, in Supplementary Table S1. Identical cell lines are listed at terminal nodes separated by commas. Solid red bar indicates the terminal node for the clade A subtree, which was consistently obtained in 60 randomized, replicate ML analyses. Red arc line shows the range of cell lines included in clade A. Also indicated are other nodes supported in 50% to 69% (+) and 70% to 100% (*) replications. Those nodes supported by 0.8 to 1.0 probabilities in Bayesian tree reconstruction using the software MrBayes (19) are marked with “!.” Solid blue bar indicates an example of a well-supported subcluster of cell lines from diverse tissues of origin (discussed in text). Supplementary Fig. S1 is a PDF version of the same phylogenetic tree in vertical phylogram format.

www.aacrjournals.org Mol Cancer Ther; 9(2) February 2010 283

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Zhang et al.

Figure 3. Best ML phylogenetic tree of clade A cell lines based on nucleotide sequences (ML value = −ln11135.2836). Phylogenetic methods, outgroup rooting, and tree labeling are given in Fig. 2. The tissue of origin for each cell line is indicated by color and suffix abbreviation.

from which we extracted an additional 922 mutations. For every gene, codon or amino acid residues that dif- The Tykiva Database (6) provided amino acid sequences fered from the respective “wild-type” Human RefSeq with 133 additional variants for those cancer cell lines (herein called wtRef) were included in the nucleotide or (DNA sequences are unavailable). protein multiple sequence alignments, respectively. To

284 Mol Cancer Ther; 9(2) February 2010 Molecular Cancer Therapeutics

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Cancer Cell Line Evolution

build a complete data matrix, wherever the variant cally well-supported subcluster (indicated in Fig. 2 by a codon sequence was unavailable for a particular cell line, solid blue bar) is composed of 11 tumor cell lines, includ- the wtRef codon sequence was assumed. Codons were ing 7 lymphomas (lymphocytes and lymphoblast), 2 ovar- then concatenated into a single multiple sequence align- ian tumors (adenocarcinoma of endometrium ovarian ment composed of 3,252 nucleotides for 353 cell lines. For tissue), and 2 myeloid leukemias (bone marrow). proteins, the codons were translated from the nucleotide Extensive DNA sequence heterogeneity in common tu- alignment with additional amino acid variants from the mor types sampled from different patients or cell lines Tykiva database to give a final alignment of 1,170 amino has been previously observed in large-scale cancer ge- acids. Among all cancer cell lines, the proportion of dif- nome surveys. Such studies have highlighted pathways ferent nucleotides and amino acids ranged from 0% to apparently important for the transformed phenotype. 7.9% and 0% to 6.6%, respectively. For example, a large-scale gene survey of glioblastomas Six cell lines had nucleotide sequences that were 100% found higher than expected incidences of ERBB2 muta- identical to the wtRef sequence. There were another 16 tions, which had been previously thought to be more identity groups, ranging from two to six cell lines, several commonly associated with lung, gastric, and colon can- of which included multiple tumors types (Supplementa- cers (4). Other profiling methods, such as DNA finger- ry Table S2). For example, tumor cell lines originating prints, have often been unable to distinguish cell line from patients with Hodgkin's disease (L-428 and RPMI by tissue of origin (10). Our study shows that phyloge- 6666), chronic myeloid leukemia (MEG-01), squamous netic analysis can assist in quantifying and visualizing cell cervical carcinoma (SiHa), breast ductal carcinoma the complex variability of the cancer genome, thus facil- (UACC-812), and retinoblastoma (Y79) all share identical itating further exploration of relationships between par- nucleotide sequences. In our cell line sample, MDA-MB- ticular collections of tumor samples. 435 and M14 as well as U251 and SNB-19 were also An exceptional phylogenetic structure in all ML runs previously reported to have identical DNA fingerprints was a cluster composed of up to 91 cell lines, which we will and thus could have common donor origins (10). For refertoascladeA(indicatedinFig.2andthebestML those occurrences of two or more identical cell lines, scored clade A subtree shown in Fig. 3). The clustering of a single representative was selected for the multiple se- 83 of these cell lines was consistently supported in >80% of quence alignments. The resulting phylogenetic data sets 60 separate ML replicates (Supplementary Table S5). The of 321 unique nucleotides and 292 unique protein se- existence of the clade A node, as well as several other inter- quences included the respective nucleotide or amino ac- nal nodes, was also significantly supported in BP phyloge- id wtRef outgroup (Supplementary Tables S3 and S4, netic reconstructions. All main tumor types as well as respectively). several rare types are represented in clade A. To determine Phylogenetic analysis of the cancer cell line data set is if the nucleotide tree topology reflected nonsynonymous challenging because many gene variants were repre- changes, we also reconstructed a protein-based phyloge- sented in low frequencies in the overall data set. The pat- netic tree (Supplementary Fig. S2) for all cell lines with ami- tern of nucleotide variation and the large number of cell no acid sequences (320 cell lines and 1,170 amino acids). lines meant certain phylogenetic reconstruction methods Similar to the nucleotide tree, very few clusters in the pro- were less suitable for determining tree topologies, in part tein tree were composed of cell lines with common tissue due to their sensitivity to errors involving unequal rates types. Although some associations among cell lines in the of evolutionary changes among lineages. Therefore, we nucleotide tree were not recapitulated in the protein tree, used the ML method as implemented by the software the majority of clade A cell lines clustered together, which GARLI (16), which has a rapid algorithm allowing for suggests that most variants defining this group of cell multiple reiterations to optimize parameters and assess lines are nonsynonymous changes and thereby could re- tree topology robustness. flect potentially functional changes at the protein level. The best phylogenetic tree (lowest ML value) for com- As shown in the clade A tree, several cell lines, partic- bined new DNA sequence and public nucleotide data is ularly those derived from lung, breast, and colon can- shown in Fig. 2 (shown in Supplementary Fig. S1 as a ver- cers, have especially long branches reflecting more tical phylogram with branch lengths). A similar tree to- numerous DNA variants. Higher levels of variation can pology was recovered in separate phylogenetic analyses result from either elevated mutational events in those tu- restricted to the new DNA sequences generated in this mor cell lines or sampling bias toward particular tumor study alone (data not shown). The most striking aspect types. We feel that the latter explanation is less likely of the tree is the overall lack of clustering with respect becausegenesequencedatawereavailableacrossa to tissue of origin or cancer type, which suggests that wide spectrum of tumor tissue types. Thus, the longer common nucleotide variants can occur in multiple tumor branches leading to various lung cancers, for example, types. In several clusters, cell lines derived from liquid tu- are reflective of the higher mutation rate in that particular mors, such as lymphomas and leukemias, co-occurred tumor type for the genes studied. This does not exclude with solid tumors, suggesting that both of these broad tu- the possibility that there are other genes yet to be se- mor types might be dependent on common mutations for quenced, which are commonly mutated in other cancers. tumorigenesis and metastasis. For example, one statisti- Previous studies suggest that early- and late-stage

www.aacrjournals.org Mol Cancer Ther; 9(2) February 2010 285

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Zhang et al.

tumors can accumulate different mutations, which could tionally, some methods of phylogenetic tree reconstruc- also account for differences in mutation rates (36, 37). tion are highly sensitive to taxa with unequal mutation However, in our study, we examined cell lines that, for rates and exceptional long branches will tend to cocluster the most part, likely best model late-stage cancers. Addi- as tree artifacts (38). By using ML methods, which are

Table 1. Variant codons occurring in ≥10 clade A cell lines at frequencies of ≥0.5

Gene Amino acid position Codon Variant occurrence Ratio clade A: No. clade A cell lines Total no. cell lines total cell lines

AKT1 242 GAG 35 35 1.00 AURKA 31 TTT 16 16 1.00 AURKA 57 ATT 10 10 1.00 AURKB 295 TCC 20 20 1.00 BAD 96 CGC 18 18 1.00 BIRC5 152 GAG 40 42 0.95 BUB1B 349 CAA 17 17 1.00 BUB1B 388 GCG 17 17 1.00 CDKN1A 31 AGC 12 12 1.00 CENPE 81 ACT 12 12 1.00 CENPE 338 GTA 12 12 1.00 CENPE 675 CAG 24 24 1.00 CENPE 1,535 TTT 11 11 1.00 CENPE 1,911 AGC 11 11 1.00 CENPE 2,090 ACG 10 10 1.00 CHFR 254 CCA 19 19 1.00 CHFR 528 TTG 20 20 1.00 CHFR 539 GTG 14 14 1.00 FOXO3A 53 GCC 19 20 0.95 FRAP1 479 GAT 64 64 1.00 FRAP1 999 AAC 65 65 1.00 FRAP1 1,577 GCG 61 61 1.00 FRAP1 1,851 AGC 27 27 1.00 FRAP1 2,303 CTG 25 25 1.00 INCENP 36 GAG 27 27 1.00 INCENP 120 GTT 28 28 1.00 INCENP 506 ATG 19 19 1.00 MAP2K2 64 GTC 10 10 1.00 MAP2K2 151 GAC 27 27 1.00 MAP2K2 220 ATA 51 52 0.98 PIK3CA 1,047 CAT 9 15 0.60 PIK3CD 936 TAC 21 21 1.00 PIK3CG 327 GAT 11 11 1.00 PIK3CG 442 TCC 18 18 1.00 PIK3CG 675 AGC 35 35 1.00 PIK3R1 73 TAC 35 35 1.00 PIK3R1 326 ATG 24 24 1.00 PIK3R2 313 CCC 10 10 1.00 PIK3R2 637 AGT 45 45 1.00 RPS6KB2 269 TTC 54 54 1.00 RPS6KB2 420 GCC 58 58 1.00 TSC1 322 ATG 20 20 1.00 TSC1 445 GAA 21 21 1.00 TSC2 860 TTT 10 10 1.00 TSC2 1,667 GAT 33 33 1.00 TSC2 1,732 TCG 12 12 1.00

286 Mol Cancer Ther; 9(2) February 2010 Molecular Cancer Therapeutics

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Cancer Cell Line Evolution

Table 2. Pathways enriched in MetaCore for both clade A genes (variants) and PLS-DA gene set (expression) based on hypergeometric P value of <0.05

Pathway P Source

Apoptosis and survival_BAD phosphorylation 1.32E−09 Variants 0.01061 Expression Cell cycle_The metaphase checkpoint 5.76E−10 Variants 0.007398 Expression Cell cycle_Chromosome condensation in prometaphase 0.002715 Variants 0.04408 Expression Cell cycle_Role of APC in cell cycle regulation 0.01052 Variants 0.03767 Expression Signal transduction_AKT signaling 1.47E−15 Variants 0.01209 Expression Development_PIP3 signaling in cardiac myocytes 3.53E−15 Variants 0.001808 Expression Development_IGF-RI signaling 7.83E−15 Variants 0.031189 Expression G protein signaling_EDG5 signaling 0.0368 Variants 0.04996 Expression G protein signaling_G Protein α-12 signaling pathway 0.01146 Variants 0.04157 Expression Development_EDG3 signaling pathway 6.37E−06 Variants 0.000108 Expression Development_Mu-type opioid receptor signaling 3.85E−06 Variants 0.008602 Expression Immune response_Role of DAP12 receptors in NK cells 0.03142 Variants 0.000531 Expression Neurophysiologic process_HTR1A receptor signaling in neuronal cells 0.007976 Variants 0.02727 Expression less sensitive to long-branch effects, we have minimized were represented disproportionately higher (>50%) in the introduction of this bias in our phylogeny. clade A cell lines. These variants occurred in genes in- Clade A cell lines had several unique variant combina- volved with chromosome segregation and cell prolifera- tions distinguishable from other cell lines (Table 1). As an tion (AURKA, BUB1B, INCENP, CENPE, CDKN1A,and example, the gene RPS6KB2 encodes ribosomal protein CHFR), phosphoinositide 3-kinase/mammalian target S6 kinase (70 kDa, polypeptide 2 isoform), which func- of rapamycin pathway genes (TSC1, PIK3CA, PIK3CG, tions downstream of the rapamycin-sensitive mammali- PIK3R1, and PIK3R2), the oncogene KRAS, and the tumor an target signaling pathway involved in cell growth suppressor TP53. and proliferation (39). RPS6KB2 is also one of the most We looked for significant differences in mRNA ex- variable genes in our analysis having both synonymous pression between clade A and other cell lines using da- and nonsynonymous nucleotide variants. Clade A ta previously released to the caBIG database (22) by included all 58 cell lines with one particular RPS6KB2 GlaxoSmithKline.9 PLS-DA was used to identify genes variation, A420V. This variant, a known human polymor- with the most discriminating probe sets between clade phism (RS13859), is also suggested to be under positive A and non–clade A solid tumor cell lines. A total of 398 selection pressure from our analysis (described below). probe sets (PLS-DA gene set) were selected based on Another example is the gene BIRC5, a member of the their high VIP values. Pathway enrichment analyses inhibitor of apoptosis gene family, which negatively reg- on the PLS-DA gene set and clade A commonly variant ulates apoptotic cell death. There are multiple splice var- genes returned many of the same pathways (Table 2). iants of BIRC5, including survivin-2B, which is highly In addition, regulation of translation initiation was sig- expressed during fetal development and in many tumors nificantly enriched in PLS-DA gene expression set (P = but not expressed in normal adult tissue (40). The BIRC5 2.706E−03), which is a downstream pathway for several variant E152E/K, which occurred in 42 cell lines, includ- variant genes. ing 40 clade A members, also tested significant for posi- tive selection. Many nonsynonymous variants that occurred in 10 or more cell lines (across all 353 cell lines) 9 https://cabig.nci.nih.gov/caArray_GSKdata/

www.aacrjournals.org Mol Cancer Ther; 9(2) February 2010 287

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Zhang et al.

Table 3. Genes with amino acids under positive selection in clade A and all tumor cell lines

Tissue Positive SNP ID Gene Occurrence selected sites in cell lines Clade A All

Brain I391M rs3729680 PIK3CA 10 49 R273H*/Y/C rs28934576* TP53 824 Breast E152K BIRC5 40 42 Q349R rs18011376 BUB1B 17 17 H1047R/L PIK3CA 915 Colon E152K BIRC5 40 42 G12V/C/D/A KRAS 22 52 G13D KRAS 49 R248W/Q* rs11540652* TP53 728 R273H rs28934576 TP53 824 Bone R72P rs1042522 TP53 34 144 Lung G12V/CS/R/F KRAS 22 52 I391M rs3729680 PIK3CA 10 49 R72P rs1042522 TP53 34 144 Peripheral tissues G13D/A/V/L KRAS 49 G12D/V/N/C NRAS 15 Q61L/K/H† rs11554290† NRAS 419 I391M rs3729680 PIK3CA 10 49 R72P*/R rs1042522* TP53 34 144 R248Q*/W/stop rs11540652* TP53 728 Prostate E152K BIRC5 40 42 Q349R rs1801376 BUB1B 17 17 M506T rs2277283 INCENP 19 19 G12V KRAS 22 52 S442Y rs17847825 PIK3CG 18 18 P313S rs1011320 PIK3R2 10 10 Kidney I391M rs3729680 PIK3CA 10 49 Skin V600E/D BRAF 836 Q61K/R* rs11554290* NRAS 419 M326I rs3730089 PIK3R1 24 24 A420V rs13859 RPS6KB2 58 58 R72P rs1042522 TP53 34 144 Ovarian R72P rs1042522 TP53 34 144 Pancreas G12V/D/C KRAS 22 52 M326I rs3730089 PIK3R1 24 24 R72P rs1042522 TP53 34 144 R248Q*/W rs11540652* TP53 728 I255N/T TP53 14 M322T rs1073123 TSC1 20 20 Other G12D/R/C KRAS 22 52 I391M rs3729680 PIK3CA 10 49 G245S TP53 18 R248Q*/L/W rs11540652* TP53 728

NOTE: Based on PAML likelihood scores (see Supplementary Methods File 1), all tumor tissue types, where genes had amino acid sites under significant positive selection (BP > 0.95), are shown. Bladder and liver tumor-derived cell lines were also tested but do not have any significant positive selection sites. *Positive selection sites, which are also sites of known SNPs and their corresponding SNP ID. †For NRAS, positive selection site Q61 is a site of a known SNP, but mutations at this site were not of the SNP variant (Q61R).

288 Mol Cancer Ther; 9(2) February 2010 Molecular Cancer Therapeutics

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Cancer Cell Line Evolution

Student's t test (false discovery rate of 5%) confirmed morphic individuals—a hypothesis supported by two that 394 of the 398 PLS-DA genes had an adjusted P value of the SNP variants. The variant Q349R in the gene of ≤0.05. Furthermore, 192 of 398 top PLS-DA VIP genes BUB1B is associated with chronic lymphocytic leukemia were also among the top t test genes sorted by adjusted incidence (43). The P72R polymorphism in the gene TP53 P value. Pathway enrichment analysis of 398 top t test has been shown to affect risk of cervical cancer (44), and a genes based on adjusted P values yielded similar results meta-analysis of all types of cancer estimated a pooled to PLS-DAVIP pathways, although the latter were in gen- weak cancer risk in p53-P72 homozygotes (45, 46). The eral better agreement with mutation-based pathways. frequencies of some of variants in the cancer cell lines The differential mRNA expression of genes in clade A are higher than expected for human populations accord- cell lines relative to other cell lines suggests that phyloge- ing to HapMap. This could be due to the cancer cells hav- netic clustering of tumor cell lines based on DNA variants ing somatically mutated to the other allele during cell might recapitulate functional similarities in cellular path- line culture or by originating from individuals with ways. We can also exclude the possibility that the genes germline SNPs more likely to develop tumors. Further commonly mutated in clade A are substantially per- caution should be used in the interpretation of positive turbed by other mutational changes (e.g., large-scale selection by the branch-site method used by PAML, genomic aberrations such as deletions, amplifications, which has been recently challenged as overestimating or rearrangements) in the wild-type cell lines. If this sites with potential functional changes (47). Thus, further was the case, we would not expect differential pathway sequencing of the positively selected polymorphic gene expression patterns between the two groups. sites in tumors seems warranted to determine the clinical Determining the critical mutations that contribute relevance of these variants. to the malignant cellular phenotype is challenging. Our phylogenetic and positive selection analysis Beerenwinkel et al. (41) estimated that colorectal cancers approaches to the study of DNA variants in human may carry ∼100 nonsynonymous mutations, of which cancer cell lines have several important caveats. First, it perhaps as few as 3 may be sufficient for developing can- is optimal to have a complete data matrix with sequence cer. These critical mutations are commonly termed driver variant data available across all tumor samples. Given mutations. Conversely, those mutations that are accu- the low frequency of individual gene mutations, charac- mulated by the tumor cell but do not seem to confer a terization of variants across hundreds of genes is re- growth advantage are called passenger mutations (these quired for robust phylogenetic trees. Fortunately, such can occur in an accelerated manner as a consequence of genome-wide DNA sequence data are becoming avail- impaired DNA repair and apoptosis processes). Evolu- able through several large-scale cancer genome projects tionary methods may have the capability to discriminate now under way. Second, although point mutations are between passenger and driver mutations (42). In an effort very important in tumorigenesis, larger-scale DNA aber- to highlight putative driver mutations within the phylog- rations are also key mutational events in many cancers. eny, we classified cell lines by tissue of origin. Nucleotide Potentially, reconstruction of comprehensive phylogenies alignments for each tissue group were analyzed to detect of tumor might be possible using combined da- evidence for evolution influenced by positive selection ta models of both point mutations and genomic aberra- pressures using the PAML algorithms (33, 34). We found tions as has been done for mixed DNA sequence and significant evidence for the influence of positive selection morphologic data sets (48). Third, sequenced genes in pressure acting on several sites in the concatenated our data set are biased toward signal transduction path- alignment for most tissue groups. ways known to be perturbed in cancer. Broader DNA se- For those tissue groups that returned a significant test quencing surveys involving additional genes may well for positive selection, a Bayesian analysis was used to alter the tree topology. Finally, the genetic variation be- identify the sites responsible for the signal with posterior tween cancer cell lines, as well as reflecting variation probability of >0.95. These sites might be driver muta- in human tumors, could also be the result of germline tions conferring growth or survival advantage on cancer differences between the cell line donors and changes cells (Table 3). Included are known oncogenic mutations acquired during cell culture. that are highly represented in COSMIC (15), such as Here, we show that evolutionary analyses can illumi- activating mutations at G12 position of the KRAS gene nate commonalities in tumorigenic mechanisms among in 73% of primary pancreatic ductal carcinomas and diverse cancers—this finding has several important im- activating mutations at V600 position of the BRAF gene plications for future cancer research. First, phylogenetic occurring in 33% of primary malignant melanomas. trees provide a direct and visual means for understand- Interestingly, 12 of the 20 positively selected mutations ing the relationships between tumor types based on are known germline SNPs as identified in the NCBI complex data sets as well as a statistical basis for their dbSNP database (Table 3). These SNPs are present in classification. Previous results have been reported in tab- normal tissues, although our analysis suggests that they ular formats (i.e., mutated protein kinases; ref. 6), which are selected for in cell culture and possibly confer some are less tractable for identifying tumor type relationships cellular growth advantage. Therefore, we hypothesize across large multiple gene variant data sets. Second, tu- that they may also confer cancer susceptibility in poly- mor cell lines are key tools in the preclinical screening of

www.aacrjournals.org Mol Cancer Ther; 9(2) February 2010 289

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Zhang et al.

potential compounds and biological agents for anticancer Disclosure of Potential Conflicts of Interest activity. By knowing the homologous relationships of tu- mor cell lines based on genome-wide variant data, more No potential conflicts of interest were disclosed. rational selection of cell lines can be made for drug screen- ing campaigns. Furthermore, by screening cell lines from Acknowledgments different tissues of origin that might cluster together be- cause of similar DNA variation patterns, new indications WethankC.Traini,E.Thomas,K.Allen,andA.Hughesfor for novel drugs can be discovered preclinically. Finally, so- DNAsequencing;D.Zwickl,N.J.Wickett,P.K.Wall,andJeremyM. Martin for computational consulting; C.W. dePamphilis for support of phisticated DNA-based diagnostic tests to identify specific Y. Zhang; and R. Wooster and three anonymous reviewers for their target gene mutations are being increasingly used to de- comments on this manuscript. The costs of publication of this article were defrayed in part by the velop clinical treatment regimens for cancer patients payment of page charges. This article must therefore be hereby marked (49). As these data are collected across more genes and pa- advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate tients, evolutionary approaches can be used to classify this fact. clinical tumor samples based on molecular similarities Received 6/11/09; revised 11/11/09; accepted 12/22/09; published and potentially guide therapeutic decisions. OnlineFirst 2/2/10.

References 1. Sjoblom T, Jones S, Wood LD, et al. The consensus coding 18. Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of sequences of human breast and colorectal cancers. Science 2006; phylogenetic trees. Bioinformatics 2001;17:754–5. 314:268–74. 19. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic 2. Jones S, Zhang X, Parsons DW, et al. Core signaling pathways in inference under mixed models. Bioinformatics 2003;19:1572–4. human pancreatic cancers revealed by global genomic analyses. 20. Page RD. TreeView: an application to display phylogenetic trees on Science 2008;321:1801–6. personal computers. Comput Appl Biosci 1996;12:357–8. 3. Parsons DW, Jones S, Zhang X, et al. An integrated genomic analy- 21. Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R. sis of human glioblastoma multiforme. Science 2008;321:1807–12. Dendroscope: an interactive viewer for large phylogenetic trees. 4. The Cancer Genome Atlas Research Network. Comprehensive BMC Bioinformatics 2007;8:460. genomic characterization defines human glioblastoma genes and 22. Fenstermacher D, Street C, McSherry T, Nayak V, Overby C, core pathways. Nature 2008;455:1061–8. Feldman M. The Cancer Biomedical Informatics Grid (caBIG). Conf 5. Sjoblom T. Systematic analyses of the cancer genome: lessons Proc IEEE Eng Med Biol Soc 2005;1:743–6. learned from sequencing most of the annotated human protein- 23. Musumarra G, Barresi V, Condorelli DF, Fortuna CG, Scire S. coding genes. Curr Opin Oncol 2008;20:66–71. Genome-based identification of diagnostic molecular markers for 6. Ruhe JE, Streit S, Hart S, et al. Genetic alterations in the tyrosine human lung carcinomas by PLS-DA. Comput Biol Chem 2005;29: kinase of human cancer cell lines. Cancer Res 2007; 183–95. 67:11368–76. 24. Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use 7. Greaves M. Darwinian medicine: a case for cancer. Nat Rev Cancer of DNA microarray data for diagnostic and prognostic classification. 2007;7:213–21. J Natl Cancer Inst 2003;95:14–8. 8. Nishizuka S, Charboneau L, Young L, et al. Proteomic profiling of the 25. Musumarra G, Barresi V, Condorelli DF, Scire S. A bioinformatic ap- NCI-60 cancer cell lines using new high-density reverse-phase lysate proach to the identification of candidate genes for the development microarrays. Proc Natl Acad Sci U S A 2003;100:14229–34. of new cancer diagnostics. Biol Chem 2003;384:321–7. 9. Shankavaram UT, Reinhold WC, Nishizuka S, et al. Transcript 26. Perez-Enciso M, Tenenhaus M. Prediction of clinical outcome and protein expression profiles of the NCI-60 cancer cell panel: an with microarray data: a partial least squares discriminant analysis integromic microarray study. Mol Cancer Ther 2007;6:820–32. (PLS-DA) approach. Hum Genet 2003;112:581–92. 10. Lorenzi PL, Reinhold WC, Varma S, et al. DNA fingerprinting of the 27. Modlich O, Prisack HB, Munnes M, Audretsch W, Bojar H. Predic- NCI-60 cell line panel. Mol Cancer Ther 2009;8:713–24. tors of primary breast cancers responsiveness to preoperative 11. Suda K, Onozato R, Yatabe Y, Mitsudomi T. EGFR T790M mu- epirubicin/cyclophosphamide-based chemotherapy: translation of tation: a double role in lung cancer cell survival? J Thorac Oncol microarray data into clinically useful predictive signatures. J Transl 2009;4:1–4. Med 2005;3:32. 12. Snead JL, O'Hare T, Eide CA, Deininger MW. New strategies for the 28. Man MZ, Dyson G, Johnson K, Liao B. Evaluating methods for first-line treatment of chronic myeloid leukemia: can resistance be classifying expression data. J Biopharm Stat 2004;14:1065–84. avoided? Clin Lymphoma Myeloma 2008;8 Suppl 3:S107–17. 29. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a 13. Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF. A kingdom-level practical and powerful approach to multiple testing. J R Stat Soc phylogeny of eukaryotes based on combined protein data. Science B 1995;57:289–300. 2000;290:972–7. 30. Ekins S, Nikolsky Y, Bugrim A, Kirillov E, Nikolskaya T. Pathway 14. Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ. Univer- mapping tools for analysis of high content data. Methods Mol Biol sal trees based on large combined protein sequence data sets. Nat 2007;356:319–50. Genet 2001;28:281–5. 31. Yang Z, Nielsen R. Codon-substitution models for detecting molec- 15. Bamford S, Dawson E, Forbes S, et al. The COSMIC (Catalogue of ular adaptation at individual sites along specific lineages. Mol Biol Somatic Mutations in Cancer) database and website. Br J Cancer Evol 2002;19:908–17. 2004;91:355–8. 32. Zhang J, Nielsen R, Yang Z. Evaluation of an improved branch-site 16. Zwickl DJ. Genetic algorithm approaches for the phylogenetic likelihood method for detecting positive selection at the molecular analysis of large biological sequence datasets under the maximum level. Mol Biol Evol 2005;22:2472–9. likelihood criterion [dissertation]. Austin (TX): The University of Texas 33. Yang Z. PAML: a program package for phylogenetic analysis by at Austin, 2006. maximum likelihood. Comput Appl Biosci 1997;13:555–6. 17. Lewis PO. A genetic algorithm for maximum-likelihood phylogeny infer- 34. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol ence using nucleotide sequence data. Mol Biol Evol 1998;15:277–83. Biol Evol 2007;24:1586–91.

290 Mol Cancer Ther; 9(2) February 2010 Molecular Cancer Therapeutics

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Cancer Cell Line Evolution

35. Anisimova M, Yang Z. Multiple hypothesis testing to detect lineages 43. Rudd MF, Sellick GS, Webb EL, Catovsky D, Houlston RS. Variants under positive selection that affects only a few sites. Mol Biol Evol in the ATM-BRCA2-CHEK2 axis predispose to chronic lymphocytic 2007;24:1219–28. leukemia. Blood 2006;108:638–44. 36. Spencer SL, Gerety RA, Pienta KJ, Forrest S. Modeling somatic 44. Storey A, Thomas M, Kalita A, et al. Role of a p53 polymorphism in evolution in tumorigenesis. PLoS Comput Biol 2006;2:e108. the development of human papillomavirus-associated cancer. 37. Bielas JH, Loeb KR, Rubin BP, True LD, Loeb LA. Human cancers Nature 1998;393:229–34. express a mutator phenotype. Proc Natl Acad Sci U S A 2006;103: 45. vanHD,MooijaartSP,BeekmanM,etal.Variationinthehuman 18238–42. TP53 gene affects old age survival and cancer mortality. Exp 38. Kuhner MK, Felsenstein J. A simulation comparison of phylogeny Gerontol 2005;40:11–5. algorithms under equal and unequal evolutionary rates. Mol Biol Evol 46. Whibley C, Pharoah PD, Hollstein M. p53 polymorphisms: cancer 1994;11:459–68. implications. Nat Rev Cancer 2009;9:95–107. 39. Boyer D, Quintanilla R, Lee-Fruman KK. Regulation of catalytic activity 47. Nozawa M, Suzuki Y, Nei M. Reliabilities of identifying positive of S6 kinase 2 during cell cycle. Mol Cell Biochem 2008;307:59–64. selection by the branch-site and the site-prediction methods. Proc 40. Li F, Ling X. Survivin study: an update of “what is the next wave?” Natl Acad Sci U S A 2009;106:6700–5. J Cell Physiol 2006;208:476–86. 48. Ragan MA. Matrix representation in reconstructing phylogenetic 41. Beerenwinkel N, Antal T, Dingli D, et al. Genetic progression and the relationships among the eukaryotes. Biosystems 1992;28:47–55. waiting time to cancer. PLoS Comput Biol 2007;3:e225. 49. Varley KE, Mitra RD. Nested Patch PCR enables highly multiplexed 42. Greenman C, Stephens P, Smith R, et al. Patterns of somatic mutation discovery in candidate genes. Genome Res 2008;18: mutation in human cancer genomes. Nature 2007;446:153–8. 1844–50.

www.aacrjournals.org Mol Cancer Ther; 9(2) February 2010 291

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research. Published OnlineFirst February 2, 2010; DOI: 10.1158/1535-7163.MCT-09-0508

Molecular Evolutionary Analysis of Cancer Cell Lines

Yan Zhang, Michael J. Italia, Kurt R. Auger, et al.

Mol Cancer Ther Published OnlineFirst February 2, 2010.

Updated version Access the most recent version of this article at: doi:10.1158/1535-7163.MCT-09-0508

Supplementary Access the most recent supplemental material at: Material http://mct.aacrjournals.org/content/suppl/2010/02/02/1535-7163.MCT-09-0508.DC1

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://mct.aacrjournals.org/content/early/2010/02/01/1535-7163.MCT-09-0508. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from mct.aacrjournals.org on September 25, 2021. © 2010 American Association for Cancer Research.