Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Method Single-cell mutational profiling and clonal phylogeny in cancer

Nicola E. Potter,1 Luca Ermini,1 Elli Papaemmanuil,2 Giovanni Cazzaniga,3 Gowri Vijayaraghavan,1 Ian Titley,1 Anthony Ford,1 Peter Campbell,2 Lyndal Kearney,1 and Mel Greaves1,4 1The Institute of Cancer Research, London, SM2 5NG, United Kingdom; 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom; 3Centro Ricerca Tettamanti, Clinica Pediatrica, Universita` di Milano Bicocca, Ospedale San Gerardo, 20900 Monza, Italy

The development of cancer is a dynamic evolutionary process in which intraclonal, genetic diversity provides a substrate for clonal selection and a source of therapeutic escape. The complexity and topography of intraclonal genetic archi- tectures have major implications for biopsy-based prognosis and for targeted therapy. High-depth, next-generation se- quencing (NGS) efficiently captures the mutational load of individual tumors or biopsies. But, being a snapshot portrait of total DNA, it disguises the fundamental features of subclonal variegation of genetic lesions and of clonal phylogeny. Single-cell genetic profiling provides a potential resolution to this problem, but methods developed to date all have limitations. We present a novel solution to this challenge using leukemic cells with known mutational spectra as a tractable model. DNA from flow-sorted single cells is screened using multiplex targeted Q-PCR within a microfluidic platform allowing unbiased single-cell selection, high-throughput, and comprehensive analysis for all main varieties of genetic abnormalities: chimeric fusions, copy number alterations, and single-nucleotide variants. We show, in this proof-of- principle study, that the method has a low error rate and can provide detailed subclonal genetic architectures and phylogenies. [Supplemental material is available for this article.]

Cancer clones evolve by a dynamic Darwinian process of muta- a deeper interrogation of clonal architecture in cancer cell pop- tional diversification under selective pressures exerted by tissue ulations from which evolutionary relationships of subclones at ecosystems, the immune system, and therapy (Nowell 1976; Merlo diagnosis and relapse can be derived (Anderson et al. 2011). et al. 2006; Greaves and Maley 2012). Not only do cancers of a Whole-genome amplification of single cells allowing both se- similar type differ in their genomic landscapes—indeed, each is quence and copy number analysis at the single-cell level is now unique—but intraclonal genetic and phenotypic diversity is an providing even more subclonal information (Navin et al. 2011; inherent feature of this disease (Marusyk et al. 2012). Progression Baslan et al. 2012; Zong et al. 2012). These methods combined of disease (Merlo et al. 2010; Park et al. 2010) and possibly the provide a striking portrait of cancer cell diversity and clonal evo- general intransigence of advanced disease may be attributable to lution through the construction of phylogenetic trees character- the genetic diversity within the cells of a tumor and/or within the ized by a nonlinear, branching architecture (Anderson et al. 2011; propagating or stem cells (Greaves 2013). The current vogue for Navin et al. 2011; Gerlinger et al. 2012; Yates and Campbell 2012). personalized medicine and targeted therapy could well be thwarted But the type of mutation that can be interrogated restricts this ap- or achieve only transient benefit if the targets are themselves sub- proach, and the complexity of clonal architectures is underestimated. clonally segregated (Greaves and Maley 2012; Swanton 2012). Ideally, what is required for a comprehensive interrogation of Second- or next-generation sequencing (NGS) allows whole the complex genomics of cancer cells is a methodology for single- exome or whole genome sequencing of bulk cancer cells (Stephens cell analysis that has the following attributes: (1) an unbiased cell et al. 2009; Yates and Campbell 2012), and with adequate depth of sample from the cancer; (2) highly efficient single-cell sorting; (3) parallel sequence reads, it is apparent that many acquired muta- a relatively high-throughput analysis of at least a few hundred tions in the cancer genome are subclonally distributed (Nik-Zainal cells; and (4) a method that allows the simultaneous detection of et al. 2012). Subclonal genetic diversity of genome-wide copy multiple genetic alterations of different types, e.g., chimeric fusion number changes has also been demonstrated in a variety of cancers , copy number alterations (CNA), and single-nucleotide var- (Klein and Stoecklein 2009; Navin et al. 2010). Cancer cell genetic iants (SNVs) in a single cell. We here provide proof-of-principle heterogeneity at the single-cell level has long been recognized by data using single acute lymphoblastic leukemic (ALL) cells to karyotyping (Wolman 1986) and by fluorescence in demonstrate that this is feasible using multiplex targeted DNA situ hybridization (FISH) of tissue sections (Clark et al. 2008). The amplification from flow-sorted single cells followed by high- use of multiplex targeted FISH with three or four colored probes to throughput Q-PCR using the BioMark HD microfluidic platform interrogate the copy number of specific DNA targets facilitates

Ó 2013 Potter et al. This article is distributed exclusively by Cold Spring Harbor 4Corresponding author Laboratory Press for the first six months after the full-issue publication date (see E-mail [email protected] http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available Article published online before print. Article, supplemental material, and publi- under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), cation date are at http://www.genome.org/cgi/doi/10.1101/gr.159913.113. as described at http://creativecommons.org/licenses/by-nc/3.0/.

23:2115–2125 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/13; www.genome.org Genome Research 2115 www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Potter et al.

(Fluidigm). The versatility of Q-PCR allows simultaneous analysis in Figure 1. A cell was deemed to be positive for a SNP (or SNV) if of a variety of DNA alterations, and coupling this technology with the Q-PCR cycle threshold (CT) value was below 28. The presence a microfluidic platform capitalizes on a high-throughput system or absence of the signal from the probe complementary to the with small reaction volumes that lends itself to single-cell analysis. wild-type sequence determined heterozygous or homozygous Q-PCR analysis combined with the BioMark HD has been used for mutations, respectively, but the copy number of each allele cannot single-cell analysis (Citri et al. 2012; Pina et al. be inferred (Fig. 2A). The DDCT method (Applied Biosystems, Life 2012) and single nucleotide polymorphism (SNP) allelic discrimi- Technologies Ltd.) with modifications to incorporate the results nation analysis with bulk DNA (Wang et al. 2009) but not, to date, from multiple Taqman assays targeting the same region was used for single-cell mutational screening. to determine the relative copy number for each locus; the use of multiple assays to target one region increased the accuracy of at- tributed DNA CNAs (Fig. 2B). Results Several approaches were adopted during this experiment to optimize and confirm the presence of a single cell and ensure that Development of the method using the REH all assays performed efficiently under these experimental condi- -derived cell line tions (details can be found in the Methods). Efficient FACS sorting The established B-cell precursor cell line REH was first used to de- of single cells was initially confirmed by microscopy. The fluores- velop and refine this single-cell analysis method. Our previous cent cells were sorted onto a glass slide with the aim of collecting FISH (Horsley et al. 2008) and more recent SNP array analysis (data 48 independent single cells; this established the efficiency of the not shown) has characterized this cell line as ETV6-RUNX1 fusion BD FACSAria I (SORP) instrument (BD). In our hands, the failure gene positive with multiple CNAs including CDKN2A and MX1 rate is 2%–4% (one to two occasions per 48 attempts), the majority and single nucleotide polymorphisms. In this study, the EPOR SNP of which are a failure to sort a cell compared with two cells found in (SNP rs318720) was adopted as a surrogate acquired heterozygous 0.002% of occasions (one in every 528 attempts to sort a single SNV anticipated to be present in every cell. cell). As it is not possible to visualize single cells in a 96-well plate, Briefly, single carboxyfluorescein diacetate succinimidyl ester we sought to quantify the DNA in each well to identify those with (CFSE)–labeled REH and cord blood cells (normal diploid control) high amounts (of the B2M gene) indicating that two or more cells were sorted into individual wells of a 96-well plate, lysed, and DNA may have inadvertently been collected. These data were conse- target amplification completed for regions encompassing the clo- quently removed from the phylogenetic analysis but only consti- notypic ETV6-RUNX1 fusion genomic sequence, CDKN2A, MX1, tuted a maximum of two wells per plate. and the SNP rs318720. The B2M locus, located in a diploid region To estimate the error rate of each assay (gene fusion, CNA and of the genome, was used as a control. The resulting reaction mix SNV assays), a control experiment consisting of 48 cord blood cells was then diluted and Q-PCR-completed using the 96.96 dynamic was completed for each patient-specific multiplex experiment. array and the BioMark HD. The workflow for this method is shown Gene fusion and SNV assay false-positive error rates can be found in Supplemental Table 2. Only two assays (SNVs in EZH2 and BAZ2A) generated false-positive results in 2% (1/48) of cord blood cells. CNA assays had an error rate ranging from 4.3% to 7.1% (Supplemen- tal Table 3). False-negative errors rates could not be directly determined as each assay is patient-specific and no positive single-cell control sample is available. How- ever, when using allelic discrimination as- says to detect SNVs, each assay generates a signal either for the wild-type sequence, the mutant sequence, or both (most com- mon here). Each single cord blood control cell produced a signal for the wild-type se- quence confirming the assay efficiency in the multiplex system. The SNP rs318720 analyzed in the REH cell line produced a signal for the wild-type and polymorphic sequence in each cell interrogated. The as- say error rates were used to define a bona fide minor subclonal population (Methods, Defining Subclonal Populations at Low Figure 1. Schematic workflow of the multiplex targeted Q-PCR approach for the simultaneous de- tection of gene fusions, DNA copy number alterations, and mutations in single cells. Initially, CFSE- Frequencies section). We were also careful labeled single cells are sorted into each well of a 96-well plate and then lysed. Multiplex DNA specific to compare the mutation allele burdens target amplification is then completed to amplify a DNA region of interest (fusion gene, copy number generated by each approach used in this alteration, or SNV) from the two copies found in a single cell to an amount that can be detected by study, including exome sequencing, 454 Q-PCR using the BioMark HD. Amplified samples and assays are then loaded into a 96.96 dynamic array pyrosequencing, allelic discrimination by that utilizes a valve-controlled capillary network to bring these two mixes together at nanoliter volumes (completed in the IFC controller) for the Q-PCR reaction to take place. This final Q-PCR step determines digital PCR using bulk DNA, and geno- the gene fusion, mutation, or copy number alteration status for each single cell. typing of each single cell (Table 1).

2116 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Genetic analysis of single cancer cells

Figure 2. Single-cell genetic analysis of leukemic cells using the multiplex targeted Q-PCR approach and the BioMark HD platform. (A) Heatmap depicting an example of raw Q-PCR data from the BioMark HD. The rows represent single cells including six cord blood cells and seven REH cells. The columns represent assays, each completed in quadruplicate including B2M (one of three assays), MX1 (two of three assays), CDKN2A (two of three assays), the ETV6-RUNX1 fusion gene assay, and the EPOR SNP (rs318720) assay containing two Taqman probes, one complementary to the wild-type sequence (labeled with FAM) and the other complementary to the SNP sequence (labeled with VIC). The colored boxes at the junction of a row and column indicate the raw CT value (according to the key on the right) obtained for a Q-PCR reaction involving the indicated cell and assay. Assays targeting a mutation or the fusion gene provide a definitive positive or negative result indicating the presence or absence, respectively, of an alteration. The DNA copy number assays provide a raw CT value that requires further analysis (standard DDCT method, Applied Biosystems) to attribute a DNA copy number to the target gene of interest for a single cell; an example can be found in B (refer to the Single-Cell Analysis section in the Methods, and Supplemental Material). (Green arrow) The Q-PCR amplification curves generated from each copy number assay for a single cord blood cell. (Black cross in a colored box) An inadequate amplification curve. (B) Graph depicting the estimated DNA copy number of MX1 attributed reliably to 89 single cells given the assay results from B2M (assay 3) and MX1 (assay 3); one of the nine estimated copy number results used to confidently attribute a DNA copy number to the gene of interest for a single cell. The height of the bar indicates the estimated DNA copy number, and the color of the bar indicates the integer; (light blue) one copy; (dark blue) two copies; (green) three copies; and (yellow) four copies. (CB) Cord blood. (C ) Subclonal genetic architecture of the REH cell line inferred by multiplex targeted Q-PCR and confirmed by FISH analysis (126 and 100 cells, respectively); the percentages in parentheses are those obtained by FISH analysis. All cells harbored the ETV6-RUNX1 fusion (F) and the EPOR SNP compared with cord blood cells. DNA copy number is indicated for each gene and subclonal population. Representative FISH images are shown next to their respective subclone. (D) Subclonal genetic architecture of leukemic cells from a child with Down’s syndrome and acute lymphoblastic leukemia (DS-ALL) generated by multiplex targeted Q-PCR and FISH analysis (115 and 100 cells, respectively); 98% of cells harbored the P2RY8-CRLF2 fusion and the IL7R mutation (IL7Rm) by multiplex targeted Q-PCR. Of these, the majority were heterozygous mutations (IL7Rm hete). A minor subclone (2%) had a homozygous IL7R mutation (IL7Rm homo). Loss of CDKN2A was subclonal to the IL7R mutation and proceeded to homozygous loss in 11% of cells. FISH for the P2RY8-CRLF2 fusion and CDKN2A copy number confirmed these results (percentages in parentheses); the IL7R mutation cannot be detected by FISH.

Data from single cells that were removed from the Q-PCR also removed. On average, 75% of interrogated single cells gener- analysis included those wells that showed no data (no cell), those ated complete comprehensive results. A detailed breakdown of the wells in which all B2M assays did not have a strong signal (<28 CT), single-cell experiment data with explanations as to why data were and wells in which all CNA assays for a target region of interest did removed can be found in Supplemental Table 4. not produce CT results within one CT. Data from suggested minor All REH cells (n = 126 À single cells considered for phyloge- subclonal populations that did not exceed assay error rates were netic analysis) scored positive for the ETV6-RUNX1 fusion gene

Genome Research 2117 www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Potter et al.

Table 1. Whole-exome sequencing results for two ETV6-RUNX1 positive acute lymphoblastic leukemia cases

Allele burden Allele burden Wild- Allele burden estimates by 454 Allele burden estimates for type Mutant estimates by pyroseqencing estimates by single cells Case Chr. Genea p.Change allele allele Effect NGS (%) (CI) (%) (CI) digital PCR (%) (%) (CI)

Case A 5 PIK3R1 p.E518Q G C Missense 13.43 (6.7–24.5) 4 (2.1–8.1) 4.30 1.33 (0.6–2.8) 6 DAXX p.Q565* G A Nonsense 10.69 (6.2–17.6) 4 (1.6–10.3) 1.70 0.76 (0.2–2.1) 7 EZH2 p.P577L G A Missense 38.13 (30.1–46.8) 56 (48.3–63.3) 36.70 46.58 (42.3–50.9) 2 FSIP2 p.N4564Y A T Missense 11.11 — — — 17 DNAH17 p.A559G G C Missense 33.82 — — — 7 PON3 p.R32Q C T Missense 85.71 — — — 5 DNAH5 p.G3653E C T Missense 45.31 — — — 1 TNR p.D1000D G A Silent 29.47 — — — 3 BCHE p.I141T A G Missense 42.25 (30.8–54.5) 48 (43.1–53.1) 37.10 46.58 (42.3–50.9) 12 BAZ2A p.P676L G A Missense 37.17 (28.4–46.8) 34 (25.8–42.3) 33.40 43.35 (39.1–47.7) 10 C10orf112 p.R545C C T Missense 9.48 — — — 8 DKK4 p.D95H C G Missense 52.80 — — — 13 SMAD9 p.P62P C T Silent 34.72 — — — Case B 13 RB1 p.K810N G C Missense 4.65 (1.9–10.3) 4 (1.3–10.5) 1.60 2.39 (1.3–4.3) 12 KRAS p.G12C C A Missense 43.08 (31.1–55.9) 43 (38.2–47.1) 51.0 43.43 (39.1–47.9) 5 MAN2A1 p.T554M C T Missense 34.57 — — — 3 SLC7A14 p.G150A C G Missense 47.37 — — — 3 KALRN p.T1215M C T Missense 35.21 — — — 1 SRSF11 p.Q22E C G Missense 5.30 (3.2–8.6) 4.9 (2.4–9.4) 2.40 2.39 (1.3–4.3) 3 COL6A5 p.V32M G A Missense 40.00 — — — 8 TRHR p.I131M C G Missense 43.14 — — — 12 WDR66 T529FS > NS A TCCCC Nonsense 0.17 — — — aMutation targets selected for single cell interrogation are shown in bold text. Confidence intervals (CI) were established using binomial proportion test for each approach. As a guide, an allele burden of 50% confers either a het- erozygous mutation in every cell or a homozygous mutation in 25% of cells. and the EPOR SNP rs318720. Major subclonal populations had but retained the mutant allele. Loss of CDKN2A was subclonal to varying copies of the MX1 gene; 49% of cells retained two copies of the IL7R mutation and proceeded to homozygous loss in 11% of MX1 compared with 42% of cells that had gained either one or two cells. FISH for the P2RY8-CRLF2 fusion and CDKN2A copy number copies of this gene (Fig. 2C). The majority of cells showed loss of confirmed these results with respect to these loci. The loss of the both CDKN2A copies, but minor subclones were characterized by wild-type IL7R allele in a minor subclone was not anticipated. either loss of MX1 and/or loss of only one copy of CDKN2A. These Subclonal segregation of a homozygous mutation would be very DNA copy number data were independently confirmed by FISH difficult to clarify, except using single-cell analysis. using BAC probes complementary to the ETV6-RUNX1 fusion gene, CDKN2A, and MX1. The subclonal structure and percentages Single-cell genetic analysis of ETV6-RUNX1 positive ALL obtained by FISH correlated well with the results obtained by sin- cases with exome sequencing data gle-cell genetic analysis using multiplex targeted Q-PCR and the BioMark HD (Fig. 2C). To expand our approach and confirm that it could be applied to more complex genetic data sets, we selected two ETV6-RUNX1 positive ALL diagnostic bone marrow samples that had been sub- Single-cell genetic analysis of a Down’s syndrome ALL case jected to both whole-exome sequencing to identify SNVs (Table 1) We replicated this approach using leukemic cells from a child with and SNP arrays for CNA (confirmed by exome analysis) (Table 2). Down’s syndrome and acute lymphoblastic leukemia (DS-ALL) for Variations in mutation allelic burden suggested the presence of which we had previously characterized the genomic alterations subclonal populations in both cases and was confirmed using using SNP analysis and targeted Sanger Sequencing. This case was Digital PCR (Table 1). Each case also harbored varied chromosomal expected to have a simple but informative clonal architecture in- alterations in both size and location, and known recurrent sec- volving a P2RY8-CRLF2 fusion gene, loss of CDKN2A, and an IL7R ondary CNAs in ALL (Mullighan et al. 2007) were identified in- mutation involving the deletion of 8 base pairs and the insertion of cluding loss of PAX5 and CDKN2A. 10 nonconsensus base pairs in exon 6. Genomic targets for Case A included SNVs in BCHE (exome The subclonal genetic architecture of this DS-ALL case was SNV read depth 713, variant reads 30), EZH2 (exome SNV read more complex than suggested by SNP analysis. Using the single- depth 1393, variant reads 53), PIK3R1 (exome SNV read depth cell multiplex targeted Q-PCR approach, 115 single cells were an- 673, variant reads 9), DAXX (exome SNV read depth 1313, variant alyzed and compared with a further 100 cells using FISH (Fig. 2D). reads 14), and BAZ2A (exome SNV read depth 1133, variant reads Cells that did not harbor any alterations were present at 4% 42) and CNAs in CCNC and TBL1X. Targets for Case B included and 2% by FISH and multiplex targeted Q-PCR experiments, re- SNVs in KRAS (exome SNV read depth 653, variant reads 28), RB1 spectively, reflecting the 90% BLAST count/low nonleukemic cell (exome SNV read depth 1293, variant reads 6), and SRSF11 (exome mix in this diagnostic sample. The P2RY8-CRLF2 fusion and the SNV read depth 3023, variant reads 16) and CNAs in VPREB1, IL7R mutation were present in 98% of cells by multiplex targeted PAX5, CDKN2A, and DPF3 (Tables 1, 2). SNVs were chosen based Q-PCR. A minor subclone (2%) had lost the IL7R wild-type allele on allelic burden encompassing both high, low, and intermediate

2118 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Genetic analysis of single cancer cells

Table 2. DNA copy number data for two ETV6-RUNX1 positive acute lymphoblastic leukemia cases

Case CNA Type Chr. Cytoband start Cytoband end Size (bp) Genesa

Case A 1 Loss 3 p21.31 p21.31 736.86 ELP6, CSPG5, SMARCC1, DHX30, MIR1226, MAP4, CDC25A, CAMP 1 Loss 3 p21.31 p21.31 222.80 PFKFB4, UCN2, COL7A1, MIR711, UQCRC1, TMEM89, SLC26A6, CELSR3, NCKIPSD, IP6K2, PRKAR2A 1 Loss 3 p21.31 p21.31 383.99 RHOA, TCTA, AMT, NICN1, DAG1, BSN, APEH, MST1, RNF123, AMIGO3, GMPPB, IP6K1 1 Loss 3 p21.31 p21.2 634.70 RBM5, SEMA3F, GNAT1, ..., CACNA2D2, C3orf18, HEMK1, CISH, MAPKAPK3, DOCK3 1 Loss 6 q14.1 q27 91165.16 Many genes including CCNC 0 Loss 7 p14.1 p14.1 49.85 TRG 1 Loss 8 p23.1 p23.1 217.17 SgK223b 1 Loss 9 p24.3 p24.3 65.91 SMARCA2 1 Loss 9 p24.3 p24.3 126.76 DMRT1, DMRT3, DMRT2 1 Loss 21 q22.11 q22.11 255.24 MRPS6, SLC5A3, LINC00310 1 Loss X p22.33 q25 121029.37 Many genes including TBL1X Case B 1 Loss 4 q31.3 q31.3 283.60 FBXW7 1 Loss 4 q28.3 q28.3 260.02 1 Loss 4 q33 q33 95.85 MFAP3L, AADAT 1 Loss 7 p14.1 p14.1 116.23 TRG 1 Loss 7 q34 q34 140.04 TRB 1 Loss 8 q24.11 q24.11 58.36 EXT1 1 Loss 9 p21.3 p21.3 511.07 LOC554202, IFNE, MIR31, MTAP, C9orf53, CDKN2A 1 Loss 9 p13.3 p13.2 1572.02 CREB3, GBA2, RGP1, MSMP, NPR2, SPAG8, ..., RNF38, MELK, PAX5, ZCCHC7 1 Loss 10 q21.1 q21.1 53.17 BICC1 1 Loss 11 q13.4 q13.4 55.52 CHRDL2 1 Loss 14 q24.2 q24.2 123.19 DPF3 1 Loss 22 q11.22 q11.22 818.48 VPREB1, LOC96610, ZNF280B, ZNF280A, PRAME, LOC648691, POM121L1P, GGTLC2, MIR650 aTarget genes selected for single-cell analysis located in regions of loss are shown in bold text. bAccording to COSMIC (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/). frequencies; not all SNVs were included. However, allelic discrim- tecture of the tree shows a quasi-linear structure with a direction ination assays designed for DNAH17, MAN21, and SLC7A14 did toward subclone A4 representing 87.4% of the clonal population; not show sufficient target specificity to wild-type and mutant se- the subclonal heterogeneity in this case is therefore modest, given quence in the multiplex Q-PCR reaction, and the resulting data the number of genomic alterations investigated. The most recent were inconclusive. Only one attempt was made to design assays for common ancestor (MRCA) of this tree, which is the most recent each chosen SNV. The patient-specific ETV6-RUNX1 genomic fu- ancestor from which all clones of the group directly descend, is sion gene sequence was obtained by long-distance PCR (Wiemels subclone A3. This clone harbors the ETV6-RUNX1 fusion in addi- and Greaves 1999) guided by fusion break-point coordinates from tion to BCHE and EZH2 mutations, suggesting that these alterations whole-exome sequencing. Comprehensive data were collected were relatively early mutational changes in the pathogenesis of from 261 single cells for Case A and 254 from Case B. this individual ALL. The major clone A4 may have a selective fit- ness advantage over all other subclones associated with the acquisition of CCNC deletions and a BAZ2A SNV. But, as this is Derivation of phylogenetic trees using maximum parsimony a single time point snapshot of a dynamic process, we cannot ex- To uncover the clonal phylogeny from these single-cell data, we clude the possibility that subclone A4 was spawned before sub- used the maximum parsimony (MP) method (Page and Holmes clones A5 and A2, which are characterized by heterozygous 1998). The most parsimonious phylogenetic tree is the tree de- mutations in PIK3R1 and DAXX, respectively. The maximum scribing the best estimated phylogenetic relationships given the parsimony algorithm shows the inferred MRCA of the A4–A5 included taxa (group of related cells or clones). Tree branch lengths clonal clade (Fig. 3A, gray box), a group of cells that has died out or are directly proportional to the number of evolutionary changes been outcompeted or if still present, exists at too low a frequency inferred, and the points at which the branches diverge (nodes) to detect. represent the ancestor state of a clonal clade (a monophyletic Maximum parsimony searches for Case B produced two group that includes all descendants of the ancestor). A phylogeny equally parsimonious trees with identical topological structures inferred using this criterion shows how the clonal expansion has (Fig. 3B). Both trees show complex branching structures with evolved from a common ancestor toward the observed states. In a marked subclonal heterogeneity of seven clones that differ only the event that two (or more) trees hold equal parsimony, all trees for the position of subclones B4 and B5. The MRCA of the clonal are accepted (Page and Holmes 1998). Detailed method explana- expansion is represented by subclone B2, which shows a homo- tions of this approach can be found in the Methods and the Sup- zygous deletion of VPREB1 in addition to the ETV6-RUNX1 fusion. plemental Material. This subclone is the biggest clonal group after clone B3 (at di- Maximum parsimony searches for Case A resulted in one agnosis) despite the genetic diversity of the progressive subclonal maximum parsimonious tree (Fig. 3A). The phylogenetic archi- populations, suggesting that this subclone may be quiescent or

Genome Research 2119 www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Potter et al.

Figure 3. (Legend on next page)

2120 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Genetic analysis of single cancer cells reside in a niche environment. The acquisition of the KRAS mu- genetic analysis is required for a robust and definitive designation tation presumably provides selective advantage as a secondary of the segregation pattern of mutations within cancer clones. driver event. A minor subclone was also identified in which the Theoretically, the best approach might be to sequence the ge- wild-type KRAS allele was lost, suggesting the acquisition of ho- nomes of individual cells. Significant progress has recently been mozygosity for the KRAS mutation. The number of mutant gene made in this regard (Baslan et al. 2012; Zong et al. 2012), but error copies (one or two) in each single cell cannot be determined using rates and low cell throughput are, currently, significant limita- the Taqman KRAS SNV assay used here, but the increased allele tions. We adopted the alternative strategy of analyzing single cells burden characterized by whole-exome sequencing, 454 pyrose- from cancers whose genomic, mutational profile at the cell pop- quencing, and digital PCR validates this conclusion. ulation level was already established by high-resolution SNP arrays The position of subclones B4 or B5 indicates that these pop- and whole-exome sequencing. ulations are equally parsimonious ancestors of the clonal clade Using the strategy outlined here, we were able to detect all constituted by subclones B3, B7, and B8, representing ;65% of all three categories of mutational change—fusion gene, CNA, and examined cells. This represents the reiteration of either PAX5 or SNV—in a single cell. Our method is amenable to high-throughput CDKN2A loss independently in two different branches. Reiterated analysis: In each case, we were able to analyze 200–300 leukemic CNA of the same gene is a feature of the subclonal genetic archi- cells. We assume that the profiles that emerge from these numbers tecture in pediatric ALL (Anderson et al. 2011; Waanders et al. of cells are representative of the patients’ . This would be 2012), suggesting that these deletions may not only provide se- more demanding with adult carcinomas where the topographical lective advantage but may be a target for DNA level breakage via, segregation of distinct subclones could result in selective sampling, for example, RAGS (Kitagawa et al. 2002; Waanders et al. 2012) or but clearly multiple small biopsies could be assessed (Park et al. AID (Longerich et al. 2006; Klemm et al. 2009). 2010; Gerlinger et al. 2012). The screening of many thousands of single cells is possible by the method we report but is restrained at Discussion present by cost. The phylogenetic trees inferred from the cellular distribution Cancers have complex patterns of acquired mutations, and pa- of SNVs in larger numbers of cells provide additional evidence tients with closely related subtypes of cancer have unique or clone- that cancers evolve within a branching architecture of subclones. specific mutation patterns. Additional complexity clearly exists These detailed and complex subclonal architectures would not be within each individual cancer, as mutations are acquired serially detected by other genetic techniques. Prior studies on ALL sug- and with a variegated pattern of distribution in subclones. As gested that the latter is generated and sustained by genetically di- mutational profiles are increasingly used for differential diagnosis, verse cancer propagating or stem cells (Anderson et al. 2011; Notta prognostication, and therapeutic targeting, this multidimensional et al. 2011), and it will be important to confirm this using the complexity is of some consequence. current microfluidic platform. Systematically interrogating subclonal genetic complexity This proof-of-principle study using leukemic cells illustrates poses a considerable technical challenge. A clonal architecture or that it is possible to assess single cells simultaneously for different phylogeny can be inferred bioinformatically by analysis of high- types of genetic lesions—fusion genes, copy number alterations, depth NGS data (Nik-Zainal et al. 2012; Yates and Campbell 2012). and single nucleotide variants—and from these data to construct Nik-Zainal et al. (2012) reconstructed clonal phylogenies through a clonal, evolutionary phylogeny. Leukemias are likely to be sig- the development of novel algorithms that rely on whole-genome nificantly less complex in this respect than carcinomas (Vogelstein sequencing data (2003 coverage) to phase mutations with germline et al. 2013), but nevertheless, the subclonal architectures polymorphisms and define clonal and subclonal phylogenies. we illustrate here for ALL are likely to be a significant un- This is in marked contrast to the exome sequencing data used derestimation of complexity. Higher definition of clonal in this study, which do not offer the opportunity of continuous structure, including minor clones at <1%, is however achievable genomic information that would allow one to reconstruct such bybothincreasingthedepthofinitialsequencingandby phylogenies with confidence. Additionally, exome data in leuke- screening larger numbers of single cells. Also, informative mias show low total mutation burden (on average 10), which though these clonal analyses are, they remain single time point renders them further limited to differentiate clonal phylogenies. It snapshots of a very dynamic evolutionary process. Ideally, is, in fact, this limitation of exome sequencing data and to a lesser single-cell genetics and phylogenetic tree construction should extent whole-genome sequencing that the present study is pre- be applied to serial samples from individual patients, i.e., di- cisely addressing where, as shown in the example of Case B, two agnosis versus relapse, primary versus metastases, and pre- SNVs of close allele burden estimates RB1 and SRSF11 (4.65 and versus post-chemotherapy. It is known that these major tran- 5.30, respectively) could be in the same (or separate) subclone with sitions can involve selective sweeps or stringent clonal selection equal probability. Without single-cell data, this level of phyloge- (Mullighan et al. 2008; Liu et al. 2009; Anderson et al. 2011; netic resolution is not possible. We conclude that single-cell Diaz et al. 2012).

Figure 3. Phylogenetic analysis results for Cases A and B. In each case the observed clone is indicated by a circle. Yellow circles indicate tumor clones, and black circles indicate the normal cell population. The alterations are listed below each subclone; excluding ETV6-RUNX1, those without a number indicate the presence of a mutation and those with a number indicate DNA copies accordingly. The boxed subclone in gray is inferred; a group of cells that has died out or been outcompeted, or if still present, existsatalowfrequencythatcannotbereliablydetectedbythisapproach. The number in italics at each node indicates the jackknifing value. The distance unit is indicated. (A) One parsimonious tree was found for Case A consisting of four subclones with modest heterogeneity. The major clone (A4) represents 87.4% of the population. The size of each circle is pro- portional to the number of single cells included in the subclone except A4, which has been reduced by a third in this tree. (B) In Case B, there are two equally parsimonious trees composed of seven subclones. These two trees differ by the position of subclones B4 and B5, which are equal parsi- monious ancestors to subclones B3, B7, and B8. This case shows increased heterogeneity with the major clone representing 54.9% of the population (B3). The major clone B3 is reduced by half in this tree.

Genome Research 2121 www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Potter et al.

Dissecting the detailed clonal architecture of cancer has sig- as per the manufacturer’s guidelines but without the pre-enrich- nificant clinical implications. The extent of intraclonal genetic ment PCR amplification ENREF_1. Solid phase reversible immo- diversity may be predictive of progression of disease (Maley et al. bilization (SPRI) bead cleanup was used to purify products in 2006) or clinical outcome (Mroz et al. 2013). Cancer genomics preparation for sequencing (Agencourt AMPure XP beads, Beckman holds the promise of personalized medicine with therapy targeted Coulter). Flow-cell preparation, cluster generation, and paired-end at products of recurrent ‘‘driver’’ mutations (Chin et al. 2011). sequencing (75 bp reads) were performed according to the Illumina Subclonal segregation would not be a desirable credential of any protocol guidelines on an Illumina GAII Genome Analyzer. The candidate target. Targeting even a major branch of the phyloge- target coverage per sample was for 70% of the captured regions at a minimum depth of 303 sequencing coverage. netic tree rather than the founder lesion would be predicted to Sequencing reads were aligned to the (NCBI have only transient benefit and, moreover, provide selective build 37) using the BWA algorithm on default settings (Li and pressure for the emergence of previously minor clones (Greaves Durbin 2010). Duplicate reads derived by PCR were removed using and Maley 2012; Swanton 2012). Picard, and reads mapping outside the targeted region of the ge- nome were excluded from the analysis. Standard internal quality- Methods control evaluation of sequencing data including the percent of uniquely mapped reads, the percent of target region covered, the Samples percent of unmapped reads, sequence quality metrics, and total The precursor B-cell leukemic cell line, REH, was purchased from sequencing output (in GB) was performed for all samples. The American Type Culture Collection (ATCC, Virginia, USA) and cul- remaining uniquely mapping reads (;60%) provided 60%–80% tured in RPMI-1640 medium, 10% FCS. coverage over the targeted exons at a minimum depth of 303. The patient samples studied in this investigation were col- Leukemic sample identity relative to the matched remission sam- lected from Italian or UK hospitals, with local ethical review ple was controlled by digital genotyping of 100 genome-wide SNP committee approval (CCR 2285, Royal Marsden Hospital NHS markers prior to variant calling. Foundation Trust). Bone marrow aspirates or peripheral blood In house-variant caller CaVEMan (cancer variants through samples underwent lymphoprep separation and were viably fro- expectation maximization) was used to call single nucleotide zen in 10% DMSO, 90% FCS and stored in liquid nitrogen (10% substitutions (Varela et al. 2011). To call insertions and deletions, DMSO, 90% FCS). split-read mapping was implemented as a modification of the Pindel algorithm (Ye et al. 2009). Copy number and loss of het- erozygosity (LOH) analysis was performed using ASCAT (Van Loo Bulk sample analysis et al. 2010). Further details of these steps can be found in the Sup- plemental Material. For validation, all putative somatic indels were Single nucleotide polymorphism and DNA copy number array analysis confirmed by capillary sequencing via 454 pyrosequencing (Roche) of for Case A and Case B both tumor and remission samples from each patient. Mutant allele To define DNA copy number alterations and SNVs for these cases, burden estimates were derived from the fraction of reads reporting the we used the Affymetrix Cytogenetics Whole Genome 2.7M Array mutant allele over the total read depth at each genomic location, and (Affymetrix). Briefly, 100 ng of genomic DNA from both diagnostic confidence intervals were derived using the binomial distribution. and remission samples was whole-genome amplified (WGA) using Commercial and custom primers for Q-PCR and digital PCR the Affymetrix Cytogenetics Reagent Kit and the Affymetrix assay protocol according to the manufacturer’s instructions. Samples were Primer Express Software (Applied Biosystems) was used to design then fragmented to generate small products (<300 bp), which were custom genotyping Taqman Q-PCR assays for an SNV that could subsequently biotin-labeled, denatured, and loaded into the arrays. distinguish the mutant allele from its wild-type counterpart. Each After hybridization, the chips were washed, stained (streptavidin- SNV assay contained allele-specific minor-groove binder (MGB) PE), and scanned using the GeneChip Scanner 3000. CEL files were probes for the wild-type allele (FAM-labeled) and the mutant allele generated using Affymetrix GeneChip Command Console (AGCC) (VIC-labeled) (Supplemental Table 2). These assays were tested v3.1 and analyzed by Chromosome Analysis Suite (Affymetrix) to ensure specificity and reliability (refer to the Assay Validation software, version 1.2.2. Quality-control metrics for each case can section in the Supplemental Material). Assays to detect a fusion be found in Supplemental Table 1A. SNP and DNA copy number breakpoint were designed using a similar approach with an FAM- analysis for REH and the DS-ALL sample was completed using the labeled MGB probe straddling the fusion break point. DNA copy Affymetrix Genome-Wide Human SNP Array 6.0. The method number Taqman assays were purchased from Applied Biosystems details can be found in the Supplemental Material. as these have been designed for uniform amplification efficiency and have been commercially validated. Three CNA assays were ETV6-RUNX1 fusion breakpoints chosen within each DNA target region of interest and the diploid reference region encompassing B2M. Genomic sequences for the ETV6-RUNX1 positive samples (REH cell line and patient Cases A and B) were cloned using long-distance inverse PCR using DNA from bulk cells as described before (Wiemels Digital PCR and Greaves 1999). P2RY8-CRLF2 fusions were sequenced according Digital PCR was used to quantify target sequences and estimate to Mullighan et al. (2009). mutant allele burdens within bulk DNA from each patient at di- agnosis and in cord blood. This was completed using the 12.765 Whole-exome sequencing digital array (Fluidigm) and the BioMark HD. This digital array Matched genomic DNA (3–5 mg) from leukemic and complete re- contains 12 panels each with 765 individual microfluidic cham- mission samples from the two cases with childhood acute lympho- bers (6 nL volume per chamber). Six targets were simultaneously blastic leukemia was prepared for Illumina paired-end sequencing interrogated according to the manufacturer’s instructions; 5 ng of (Illumina). Exome enrichment was performed using the Agilent DNA per panel. The number of target molecules per panel was de- SureSelectXT Human All Exon 50Mb kit (Agilent Technologies Ltd.) termined using BioMark HD Digital PCR software; SNV frequencies

2122 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Genetic analysis of single cancer cells were calculated by dividing the number of mutant allele copies by number. Estimated copy number results were not considered if the the total number of copies for the wild-type and mutant alleles. confidence value was <50% or the estimated copy number was greater than four (with only quadruplicates per assay, the results Single-cell analysis are not robust enough to accurately detect DNA copy numbers greater than four) (Weaver et al. 2010). At least two of the nine Single-cell labeling and flow sorting estimated copy numbers must have a confidence value above 50% to calculate the final copy number for a region of interest. Single-cell sorting was performed on a BD FACSAria I (SORP) in- strument (BD) equipped with an automated cell deposition unit Defining subclonal populations at low frequencies using the following settings: 100-mm nozzle, 1.4 bar sheath pres- sure, 32.6 kHz head drive, and a flow rate that gave one to 200 Each assay type (gene fusion, CNA, or SNV) varied in error rate events per second. Details of viable cell thawing, single-cell CFSE demanding careful consideration when defining subclonal pop- staining (according to the manufacturer’s instructions), cell sorting ulations at low frequencies. Gene fusion assays and SNV assays did parameter explanations, and the assessment of single-cell sorting not yield any false-positive results in the control experiments, efficiencies can be found in the Supplemental Material. Two 96- except EZH2 and BCHE for which we saw one cell (Supplemental well plates of single cells were collected for REH and the DS-ALL Table 2). However, CNA assays had an average error rate of 5.4% Case, and four plates were collected for the two ALL Cases. The (Supplemental Table 3). Consequently, the following criterion was plates were composed of a no template control (NTC), 11 control used to define a bona fide subclonal population: An observed sig- cord blood cells, and 84 target cells (REH or patient cells). nal pattern (character state) must be attributed to four or more cells (in this experiment, the equivalent of ;1%). A population present Single-cell multiplex targeted pre-amplification and Q-PCR at less than the error rate for a given CNA assay cannot be defined by that single CNA alone, but if two CNA alterations define Labeled single cells were sorted into 2.5 mL of lysis buffer composed a subclonal population, it was deemed to be true. A single fusion of 1 mg/mL proteinase K (Qiagen) and 0.5% Tween 20 in HEPES- event or SNV can distinguish a minor subclonal population. For buffered saline (Sigma-Aldrich). Lysis was carried out for 50 min at example, in Case B, subclonal B6 is defined from the ancestral 60°C followed by 10 min at 98°C. Specific (DNA) targeted ampli- subclone B2 by a KRAS mutation. fication (STA) was then performed prior to Q-PCR. This multiplex STA reaction was composed of 5 mL of pre-amplification master mix (Life Technologies) and 2.5 mL of 1:40 primer mix (containing Interphase fluorescence in situ hybridization (FISH) all primers for the gene targets of interest). Denaturation was Methanol–acetic acid fixed cells, prepared by standard cytogenetic completed for 15 min at 95°C, followed by 24 cycles of amplifi- techniques, were used for all FISH studies. Interphase FISH for the cation for 15 sec at 95°C and for 4 min at 60°C. The STA product ETV6–RUNX1 fusion gene in combination with probes for regions was then diluted 1:6 using DNA suspension buffer (Teknova). Fi- of copy number alteration was performed using a commercial LSI nally, 2.7 mL of the single-cell target amplified DNA was in- ETV6-RUNX1 extra signal (ES) probe (Vysis, Abbott Laboratories) as terrogated by Q-PCR for each DNA target of interest using the previously described (Horsley et al. 2008; Bateman et al. 2010). The 96.96 dynamic microfluidic array and the BioMark HD as recom- P2RY8-CRLF2 gene fusion was identified using a dual color break- mended by the manufacturer; thermal phase for 1800 sec at 70°C, apart probe consisting of two bacterial artificial chromosome for 60 sec at 25°C, followed by a hot start phase of 60 sec at 95°C. (BAC) probes flanking the CRLF2 locus (Supplemental Table 5). This was followed by 35 cycles of 5 sec at 96°Cand20secat BAC and fosmid probes for this and other regions of interest were 60°C. CNA assays were completed in quadruplicates, and SNV or obtained from the BACPAC Resource Center (Children’s Hospital, fusion assays were completed in duplicates. Oakland Research Institute) (http://bacpac.chori.org). Probes were labeled by nick translation with either biotin-16-dUTP (Roche Single-cell Q-PCR analysis Ltd.), SpectrumRed, or SpectrumOrange (Vysis) and hybridized in

The BioMark HD generates a CT value for each reaction. A het- combination. FISH was performed by standard protocols (Horsley erozygous mutation was considered to be present if the signals et al. 2008; Bateman et al. 2010) with a single detection layer of from the mutant and wild-type sequence probes (FAM and VIC, streptavidin-Cy5 for biotinylated probes. Fluorescent signals were viewed using an Olympus AX2 fluorescence microscope equipped respectively) had a CT value <28 in a single cell. A homozygous mutation was considered to be present if there was no wild-type with narrow bandpass filters for FITC, SpectrumOrange, TexasRed, sequence signal. and Cy5. Images were captured and analyzed using a charge- To ensure robust DNA CNA data from a system that can be coupled device (Photometrics) and SmartCapture 3 software ver- influenced by assay efficiency and experimental variation, we used sion 3.0.4 (Digital Scientific UK). In each case, 100 nuclei were the DDCT method (Applied Biosystems) to determine a copy scored for the presence of the fusion gene and other targets of in- number for each locus with modifications to incorporate data from terest. Nuclei from karyotypically normal cells (peripheral blood three distinct assays targeting the control region (B2M) and the samples from normal individuals) were used to assess probe hy- region of interest. The DDCT value was calculated for every target bridization efficiency. The percentage of cells with the expected = gene assay using each of the three reference gene CT values gen- normal signal pattern was 97%–100% (mean 98%) for each probe erating nine estimated DNA copy number results for a region of (Supplemental Table 5). interest. A confidence metric was assigned to the estimated copy number inferring the confidence with which an estimated copy number could be deemed true (according to Applied Biosystems Phylogenetic analysis and clonal evolution CopyCaller Software v2) (details of this approach can be found in the Supplemental Material). The weighted mean of the nine esti- Clonal groups mated DNA copy numbers (for a region of interest) was used as the Copy numbers and genotypes for each interrogated cell were final DNA copy number taking into consideration the confidence concatenated in a linear array of nine characters and then aligned metric attributed to each. This reduced the contribution of less- to identify the same motif in the array. Cells sharing identical al- reliable estimated DNA copy numbers to the final DNA copy terations were grouped together in the same clonal group. Clones

Genome Research 2123 www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Potter et al. were then aligned and used to infer the evolutionary history of Baslan T, Kendall J, Rodgers L, Cox H, Riggs M, Stepansky A, Troge J, Ravi K, each patient’s leukemia. Esposito D, Lakshmi B, et al. 2012. Genome-wide copy number analysis of single cells. Nat Protoc 7: 1024–1041. Bateman CM, Colman SM, Chaplin T, Young BD, Eden TO, Bhakta M, Maximum parsimony Gratias EJ, van Wering ER, Cazzaniga G, Harrison CJ, et al. 2010. Maximum parsimony searches were conducted using heuristic Acquisition of genome-wide copy number alterations in monozygotic twins with acute lymphoblastic leukemia. Blood 115: 3553–3558. searches. Heuristic searches were performed with a series of 1 Chin L, Andersen JN, Futreal PA. 2011. Cancer genomics: From discovery million random additional clones and tree branch-swapping using science to personalized medicine. Nat Med 17: 297–303. a bisection-reconnection (TBR) algorithm. All characters except Citri A, Pang ZP, Sudhof TC, Wernig M, Malenka RC. 2012. Comprehensive the initiating genetic lesion (ETV6-RUNX1 fusion gene) were qPCR profiling of gene expression in single neuronal cells. Nat Protoc 7: 118–127. weighted equally, and state graphs and step matrices were used to Clark J, Attard G, Jhavar S, Flohr P, Reid A, De-Bono J, Eeles R, Scardino P, assign equal costs to each character state transition in different Cuzick J, Fisher G, et al. 2008. Complex patterns of ETS gene alteration genes. The cost assigned for each transition is linear and results arise during cancer development in the human prostate. Oncogene 27: from the equation y = 2x . We assigned a transition cost equal to 1 1993–2003. i i Diaz LA Jr, Williams RT, Wu J, Kinde I, Hecht JR, Berlin J, Allen B, Bozic I, (y0 = x0) only to the transition 0 (no fusion) to 1 (one fusion) for the Reiter JG, Nowak MA, et al. 2012. The molecular evolution of acquired ETV6-RUNX1 fusion. The character state graphs and correspond- resistance to targeted EGFR blockade in colorectal cancers. Nature 486: ing matrices used are shown in Supplemental Table 6. The normal 537–540. clone (according to the alterations interrogated) found in each case Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, Tarpet P, et al. 2012. Intratumor was assumed to be the ancestral clone and included in the analysis heterogeneity and branched evolution revealed by multiregion as the root of the tree/trees. All parsimony analyses were performed sequencing. N Engl J Med 366: 883–892. using the computer software PAUP* version 4.0b10 for Linux Greaves M. 2013. Cancer stem cells as ‘units of selection.’ Evol Appl 6: (Swofford 2005). Trees were visualized using Dendroscope Soft- 102–108. Greaves M, Maley CC. 2012. Clonal evolution in cancer. Nature 481: ware version 3 (Huson and Scornavacca 2012). 306–313. Horsley SW, Colman S, McKinley M, Bateman CM, Jenney M, Chaplin T, Node support: jackknife Young BD, Greaves M, Kearney L. 2008. Genetic lesions in a preleukemic aplasia phase in a child with acute lymphoblastic leukemia. Genes Support for the internal branches was assessed in PAUP* by jack- Cancer 47: 333–340. knife with 1000 pseudo-replicates. Heuristic searches with ran- Huson DH, Scornavacca C. 2012. Dendroscope 3: An interactive tool for domly added taxa followed by applying tree bisection and recon- rooted phylogenetic trees and networks. Syst Biol 61: 1061–1067. nection algorithms were used for each jackknifing iteration deleting Kitagawa Y, Inoue K, Sasaki S, Hayashi Y, Matsuo Y, Lieber MR, Mizoguchi H, Yokota J, Kohno T. 2002. Prevalent involvement of illegitimate V(D)J 12.5% of the characters in each pseudo-replica. A jackknife 50% recombination in chromosome 9p21 deletions in lymphoid leukemia. majority-rule consensus tree (Margush and McMorris 1981) was J Biol Chem 277: 46289–46297. used to support the node of phylogenetic trees inferred. Klein CA, Stoecklein NH. 2009. Lessons from an aggressive cancer: Evolutionary dynamics in esophageal carcinoma. Cancer Res 69: Mimicking the bootstrap procedure 5285–5288. Klemm L, Duy C, Iacobucci I, Kuchen S, von Levetzow G, Feldhahn N, We wrote an R in-house script to mimic the bootstrap resampling Henke N, Li Z, Hoffmann TK, Kim YM, et al. 2009. The B cell mutator method. The script samples without replacement from the aligned AID promotes B lymphoid blast crises and drug resistance in chronic myeloid leukemia. Cancer Cell 16: 232–245. clones and generates replicas with columns shuffled in different Li H, Durbin R. 2010. Fast and accurate long-read alignment with Burrows- orders. Wheeler transform. Bioinformatics 26: 589–595. The script was run on 50 separate occasions for Cases A and B, Liu W, Laitinen S, Khan S, Vihinen M, Kowalski J, Yu G, Chen L, Ewing CM, using R software for statistical computing version 2.15 (R Core Eisenberger MA, Carducci MA, et al. 2009. Copy number analysis indicates monoclonal origin of lethal metastatic prostate cancer. Nat Team 2013). Each resulting replica was used as input for a maxi- Med 15: 559–565. mum parsimony search employing the same settings as described Longerich S, Basu U, Alt F, Storb U. 2006. AID in somatic hypermutation in the Supplemental Material. and class switch recombination. Curr Opin Immunol 18: 164–174. Maley CC, Galipeau PC, Finley JC, Wongsurawat VJ, Li X, Sanchez CA, Paulson TG, Blount PL, Risques RA, Rabinovitch PS, et al. 2006. Genetic Data access clonal diversity predicts progression to esophageal adenocarcinoma. Nat Genet 38: 468–473. The whole-exome sequencing data generated in this study Margush T, McMorris FR. 1981. Consensus n-trees. Bull Math Biol 43: have been submitted to The European Genome-phenome Archive 239–244. (EGA; http://www.ebi.ac.uk/ega/) under accession number Marusyk A, Almendro V, Polyak K. 2012. Intra-tumour heterogeneity: A looking glass for cancer? Nat Rev Cancer 12: 323–334. EGAD00001000636. Single nucleotide polymorphism and copy Merlo LMF, Pepper JW, Reid BJ, Maley CC. 2006. Cancer as an evolutionary number analysis by SNP array data have been submitted to the and ecological process. Nat Rev Cancer 6: 924–935. NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm. Merlo LM, Shah NA, Li X, Blount PL, Vaughan TL, Reid BJ, Maley CC. 2010. nih.gov/geo/) under accession number GSE49215. A comprehensive survey of clonal diversity measures in Barrett’s esophagus as biomarkers of progression to esophageal adenocarcinoma. Cancer Prev Res (Phila) 3: 1388–1397. Mroz EA, Tward AD, Pickering CR, Myers JN, Ferris RL, Rocco JW. 2013. High Acknowledgments intratumor genetic heterogeneity is related to worse outcome in patients This work was supported by Leukaemia & Lymphoma Research with head and neck squamous cell carcinoma. Cancer 119: 3034–3042. Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan-Smith E, Dalton JD, United Kingdom and the Kay Kendall Leukaemia Fund United Girtman K, Mathew S, Ma J, Pounds SB, et al. 2007. Genome-wide Kingdom. analysis of genetic alterations in acute lymphoblastic leukaemia. Nature 446: 758–764. Mullighan CG, Phillips LA, Su X, Ma J, Miller CB, Shurtleff SA, Downing JR. References 2008. Genomic analysis of the clonal origins of relapsed acute lymphoblastic leukemia. Science 322: 1377–1380. Anderson K, Lutz C, van Delft FW, Bateman CM, Guo Y, Colman SM, Mullighan CG, Collins-Underwood JR, Phillips LA, Loudin MG, Liu W, Kempski H, Moorman AV, Titley I, Swansbury J, et al. 2011. Genetic Zhang J, Ma J, Coustan-Smith E, Harvey RC, Willman CL, et al. 2009. variegation of clonal architecture and propagating cells in leukaemia. Rearrangement of CRLF2 in B-progenitor- and Down syndrome- Nature 469: 356–361. associated acute lymphoblastic leukemia. Nat Genet 41: 1243–1246.

2124 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Genetic analysis of single cancer cells

Navin N, Krasnitz A, Rodgers L, Cook K, Meth J, Kendall J, Riggs M, Eberling Varela I, Tarpey P, Raine K, Huang D, Ong CK, Stephens P, Davies H, Jones D, Y, Troge J, Grubor V, et al. 2010. Inferring tumor progression from Lin ML, Teague J, et al. 2011. Exome sequencing identifies frequent genomic heterogeneity. Genome Res 20: 68–80. mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Navin N, Kendall J, Troge J, Andrews P, Rodgers L, McIndoo J, Cook K, Nature 469: 539–542. Stepansky A, Levy D, Esposito D, et al. 2011. Tumour evolution inferred Vogelstein B, Papadopoulos N, Velculescu V, Zhou S, Diaz L, Kinzler K. 2013. by single-cell sequencing. Nature 472: 90–94. Cancer genome landscapes. Science 339: 1546–1558. Nik-Zainal S, Van Loo P, Wedge DC, Alexandrov LB, Greenman CD, Lau KW, Waanders E, Scheijen B, van der Meer LT, van Reijmersdal SV, van Emst L, Raine K, Jones D, Marshall J, Ramakrishna M, et al. 2012. The life history Kroeze Y, Sonneveld E, Hoogerbrugge PM, van Kessel AG, Van Leeuwen of 21 breast cancers. Cell 149: 994–1007. FN, et al. 2012. The origin and nature of tightly clustered BTG1 deletions Notta F, Mullighan CG, Wang JC, Poeppl A, Doulatov S, Phillips LA, Ma J, in precursor B-cell acute lymphoblastic leukemia support a model of Minden MD, Downing JR, Dick JE. 2011. Evolution of human BCR-ABL1 multiclonal evolution. PLoS Genet 8: e1002533. lymphoblastic leukaemia-initiating cells. Nature 469: 362–367. Wang J, Lin M, Crenshaw A, Hutchinson A, Hicks B, Yeager M, Berndt S, Nowell PC. 1976. The clonal evolution of tumor cell populations. Science Huang W-Y, Hayes R, Chanock S, et al. 2009. High-throughput single 194: 23–28. nucleotide polymorphism genotyping using nanofluidic dynamic Page RMD, Holmes EC. 1998. Molecular evolution: A phylogenetic approach. arrays. BMC Genomics 10: 561. Wiley-Blackwell, New York. Weaver S, Dube S, Mir A, Qin J, Sun G, Ramakrishnan R, Jones RC, Livak KJ. Park SY, Go¨nen M, Kim HJ, Michor F, Polyak K. 2010. Cellular and genetic 2010. Taking qPCR to a higher level: Analysis of CNV reveals the power diversity in the progression of in situ human breast carcinomas to an of high throughput qPCR to enhance quantitative resolution. Methods invasive phenotype. J Clin Invest 120: 636–644. 50: 271–276. Pina C, Fugazza C, Tipping AJ, Brown J, Soneji S, Teles J, Peterson C, Enver T. Wiemels JL, Greaves M. 1999. Structure and possible mechanisms of TEL- 2012. Inferring rules of lineage commitment in haematopoiesis. Nat Cell AML1 gene fusions in childhood acute lymphoblastic leukemia. Cancer Biol 14: 287–294. Res 59: 4075–4082. R Core Team. 2013. R: A language and environment for statistical computing. Wolman SR. 1986. Cytogenetic heterogeneity: Its role in tumor evolution. R Foundation for Statistical Computing, Vienna, Austria. http://www.R- Cancer Genet Cytogenet 19: 129–140. project.org/. Yates LR, Campbell PJ. 2012. Evolution of the cancer genome. Nat Rev Genet Stephens PJ, McBride DJ, Lin M-L, Varela I, Pleasance ED, Simpson JT, 13: 795–806. Stebbings LA, Leroy C, Edkins S, Mudie LJ, et al. 2009. Complex YeK,SchulzMH,LongQ,ApweilerR,NingZ.2009.Pindel:Apattern landscapes of somatic rearrangement in human breast cancer genomes. growth approach to detect break points of large deletions and medium Nature 462: 1005–1010. sized insertions from paired-end short reads. Bioinformatics 25: 2865– Swanton C. 2012. Intratumor heterogeneity: Evolution through space and 2871. time. Cancer Res 72: 4875–4882. Zong C, Lu S, Chapman AR, Xie XS. 2012. Genome-wide detection of single- Swofford D. 2005. PAUP*: Phylogenetic analysis using parsimony (and other nucleotide and copy-number variations of a single human cell. Science methods), Version 4.0, Beta 10. Sinauer, Sunderland, MA. 338: 1622–1626. Van Loo P, Nordgard SH, Lingjaerde OC, Russnes HG, Rye IH, Sun W, Weigman VJ, Marynen P, Zetterberg A, Naume B, et al. 2010. Allele- specific copy number analysis of tumors. Proc Natl Acad Sci 107: 16910– 16915. Received May 2, 2013; accepted in revised form September 17, 2013.

Genome Research 2125 www.genome.org Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Single-cell mutational profiling and clonal phylogeny in cancer

Nicola E. Potter, Luca Ermini, Elli Papaemmanuil, et al.

Genome Res. 2013 23: 2115-2125 originally published online September 20, 2013 Access the most recent version at doi:10.1101/gr.159913.113

Supplemental http://genome.cshlp.org/content/suppl/2013/10/04/gr.159913.113.DC1 Material

References This article cites 47 articles, 11 of which can be accessed free at: http://genome.cshlp.org/content/23/12/2115.full.html#ref-list-1

Creative This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the Commons first six months after the full-issue publication date (see License http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

© 2013 Potter et al.; Published by Cold Spring Harbor Laboratory Press