Characterizing heterogeneous cellular responses to perturbations

Michael D. Slacka,b, Elisabeth D. Martineza,c, Lani F. Wua,b,1, and Steven J. Altschulera,b,1

aDepartment of Pharmacology, bGreen Center for Systems Biology, Simmons Cancer Center, and the cHamon Center for Therapeutic Oncology Research, University of Texas Southwestern Medical Center, Dallas, TX 75390

Edited by Alfred G. Gilman, University of Texas Southwestern Medical Center, Dallas, TX, and approved October 16, 2008 (received for review July 20, 2008) Cellular populations have been widely observed to respond hetero- rithms to identify phenotypic ‘‘stereotypes’’ within the overall geneously to perturbation. However, interpreting the observed het- population and assign probabilities to cells belonging to subpopu- erogeneity is an extremely challenging problem because of the lations modeled on these stereotypes. Subpopulation identification complexity of possible cellular phenotypes, the large dimension of is based on application of a Gaussian mixture model (GMM), which potential perturbations, and the lack of methods for separating has been applied recently in the context of online phenotype meaningful biological information from noise. Here, we develop an discovery (30). Finally, for each population (or condition), we image-based approach to characterize cellular phenotypes based on estimated the fraction of cells in each subpopulation and summa- patterns of signaling marker colocalization. Heterogeneous cellular rized the results as a probability vector. The resulting ‘‘subpopula- populations are characterized as mixtures of phenotypically distinct tion profiles’’ are human- and machine-interpretable and enable the subpopulations, and responses to perturbations are summarized comparison of heterogeneous responses across multiple physiolog- succinctly as probabilistic redistributions of these mixtures. We apply ical, pathological, or environmental perturbations. our method to characterize the heterogeneous responses of cancer To develop our approach, we analyzed existing and new image cells to a panel of drugs. We find that cells treated with drugs of databases of drug-treated HeLa cells. Drug treatments provided (dis-)similar mechanism exhibit (dis-)similar patterns of heterogene- reliable and well-characterized mechanistic classes of perturbations ity. Despite the observed phenotypic diversity of cells observed that could be varied in both time and dose. Furthermore, existing within our data, low-complexity models of heterogeneity were suf- drug classifications provided a biologically motivated method for ficient to distinguish most classes of drug mechanism. Our approach estimating a crucial property of cellular heterogeneity. Namely, offers a computational framework for assessing the complexity of whether increasing numbers of subpopulations provided improved cellular heterogeneity, investigating the degree to which perturba- ability to distinguish different mechanistic classes of perturbations. tions induce redistributions of a limited, but nontrivial, repertoire of [Traditionally, the tradeoffs between model fit and complexity are underlying states and revealing functional significance contained addressed by a nonbiological criterion, such as by Bayesian Infor- within distinct patterns of heterogeneous responses. mation Criterion (B.I.C.) or gap statistics (30).] Thus, drug pertur- bations and their classifications provided an unbiased criterion for automated microscopy ͉ cellular heterogeneity ͉ image analysis determining the degree of complexity required to interpret patterns of cellular heterogeneity. henotypic heterogeneity in cellular populations has been ob- Pserved in diverse physiological processes (1–9), pathophysio- Results logical conditions (10, 11), and responses to therapeutics (10, 12, To study the heterogeneous effects of perturbations on cells, we 13). However, interpreting cellular heterogeneity—and its changes made use of an existing image database of HeLa cells treated with in response to perturbations—has been an extremely challenging 100 drugs at 13 concentrations and stained by using a multiplexed problem. Does heterogeneity contain biologically (or clinically) collection of immunofluorescent markers (DNA/SC35/anillin) important information? If so, what level of resolution is required to (25). However, this dataset did not contain temporal information extract this information? and additionally lacked consistent numbers of representative drugs The complexity of cellular heterogeneity is not yet well under- in several key categories of drug mechanism. Thus, we created a stood: At one extreme, heterogeneity may reflect biological ‘‘noise’’ 25-drug dataset capturing the responses of HeLa cells to drugs at about a single ‘‘average’’ phenotypic state, whereas at the other varying concentrations [low or high, as specified in supporting extreme, it may reflect unbounded complexity, with each cell information (SI) Table S1] and time points (3, 6, 12, or 24 h). For representing a distinct state. An intermediate alternative—that we this dataset, we selected 5 drugs in each of 5 known functional investigate here—is that cellular populations can be well described categories, affecting DNA replication (e.g., methotrexate), histone as mixtures of a limited number of phenotypically distinct subpopu- deacetylation (e.g., ), microtubules (e.g., paclitaxel), lations. A practical challenge is to determine the complexity of the glucocorticoid receptors (e.g., dexamethasone), and topoisom- observed cellular phenotypes and develop methods for identifying erase (e.g., doxorubicin). We additionally included an ‘‘other’’ patterns of cellular heterogeneity. category containing 10 drugs of miscellaneous (e.g., dichloroacetic Here, we developed methods for characterizing patterns of acid) or unspecified (e.g., green tea-extracted polyphenols) mech- spatial heterogeneity observed within cellular populations. To capture cellular heterogeneity, we made use of high-content image- based assays, an increasingly powerful approach for capturing Author contributions: M.D.S., E.D.M., L.F.W., and S.J.A. designed research; M.D.S. and E.D.M. performed research; M.D.S. analyzed data; and M.D.S., E.D.M., L.F.W., and S.J.A. cellular responses to perturbations and facilitating the objective wrote the paper. quantification of cellular phenotypes (14–28). [An interesting al- The authors declare no conflict of interest. ternative—not considered here—is to study heterogeneity captured This article is a PNAS Direct Submission. by fluorescence-activated cell sorting (FACS), which provides 1To whom correspondence may be addressed at: Department of Pharmacology, University increased numbers of readouts per cell, but without spatial reso- of Texas Southwestern Medical Center, 6001 Forest Park Road, ND9.214, Dallas, TX 75390. lution (29)]. First, we applied image-processing algorithms to E-mail: [email protected] or [email protected]. extract phenotypic measurements of the activation and colocaliza- This article contains supporting information online at www.pnas.org/cgi/content/full/ tion patterns of cellular readouts from large numbers of cells in 0807038105/DCSupplemental. diverse conditions. Next, we used unsupervised clustering algo- © 2008 by The National Academy of Sciences of the USA

19306–19311 ͉ PNAS ͉ December 9, 2008 ͉ vol. 105 ͉ no. 49 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0807038105 Downloaded by guest on September 28, 2021 C Reference subpopulation models Fig. 1. Heterogeneous cellular populations can be characterized as Feature space Subpopulations Image database mixtures of subpopulations with dis- A 100% tinct phenotypes. For convenience of 1 90% S4 visualization, results are shown for 80% k ϭ 4 subpopulations. (A and B) Illus- 70% tration of approach. (A) HeLa cells are 0.5 S3 60% treated with drugs, labeled with flu- 50% orescence markers, and imaged. Indi- PC 2 40% 0 S2 vidual cellular regions are identified. 30% (B) A high-dimensional vector of phe- 20% notypic features is extracted from -0.5 10% S1 each cell. Cells are assigned probabil- 0% ities of belonging to subpopulations -0.5 0 0.5 1 DNA/SC35/anillin that have been identified by using a PC 1 100% DNA/SC35/anillin 1 GMM. Scatter plot: cells are visual- 90% S4 ized as points via feature representa- 80% tion and PCA reduction to two di- 0.5 70% mensions; colored cell outlines colors 60% S3 in A and points in B correspond to 0 50% most probable subpopulation assign- 40% ment; ellipses represent unit stan- Feature Space S2 dard deviation from the mean for B 30% -0.5 each subpopulation. Shown are 3 20% representative cells taken from vari- 10% S1 -1 ous locations in each ellipse. (C)

PC 2 0% -1 -0.5 0 0.5 1 DNA/p-p38/p-ERK Shown are subpopulation models for 100% different marker sets. Models are 1 built from ‘‘reference’’ sets of cells 90% S4 sampled from multiple conditions. 80% Scatter plots are as in B. Stacked bar 0.5 70% graphs represent (prior) probabilities 60% S3 of cell membership in each subpopu- 0 50% lation. Sample cells from within 1 40% S2 standard deviation of the mean phe- -0.5 30% notype are shown at right for each PC 1 20% subpopulation (image colors corre- 10% S1 spond to markers in order; e.g., blue/ -1 0% green/red corresponds to DNA/SC35/ -1.5 -1 -0.5 0 0.5 α DNA/actin/ -tubulin anillin). (Scale bar: 20 ␮m.)

anism. After fixation, cells were stained with 1 of 2 sets of Although alternative choices of cellular features could certainly be fluorescent markers (DNA/phospho-p38/phospho-ERK or DNA/ substituted in the remaining analysis, we found that these ratio- actin/␣-tubulin; see Table S2) selected as general readouts of cell metric features performed well at grouping cells with visually signaling state (25). similar signaling states in similar regions of feature space. For each image, we identified individual cellular regions and To enable comparisons of cellular heterogeneity across experi- extracted phenotypic features (Fig. 1 A and B). There are multiple mental conditions, a reference training set of 10,000 cells was approaches for extracting phenotypic information, including de- constructed for each marker set from a uniform sampling of control tecting either known biological phenotypes [e.g., apoptosis (31) or and drug-treated wells (Materials and Methods). This sample rep- cell cycle phases (32) or general image properties, e.g., cell mor- resented only a small fraction (Ͻ2%; see Materials and Methods)of phology and patterns of marker intensities (14, 18)]. Here, we found our overall datasets. Probabilistic clustering was then applied to this that a feature set designed to measure the degree of marker training set to characterize cell heterogeneity as stochastic variation colocalization performed well in capturing phenotypic differences of feature vectors around a collection of distinct phenotypes (Fig. in signaling state among visually distinct cells. First, we measured 1 B and C). For a specified number of subpopulations k, expectation ratios of marker intensities at each pixel to estimate the degree of maximization (EM) clustering (33) characterized the total popu- marker colocalization. To avoid bias due to selection of a ‘‘favored’’ lation as a weighted mixture of k subpopulations, each defined by denominator, we measured all 3 possible pairs of marker ratios. a Gaussian distribution around a distinct cellular stereotype and a Second, we quantized marker ratios into 16 bins. (Bin ranges were (prior) overall probability of subpopulation occurrence (30) (Fig. empirically chosen based on our data; the highest bin value was set 1C, stacked bar graph). (For convenience of visualization we to ‘‘infinity’’ to capture ratios when marker intensities in the illustrate our approach with k ϭ 4; the effects of varying k will be ϫ

denominator were close to background.) Third, we used 6 16 16 discussed subsequently.) The reference subpopulation model is thus CELL BIOLOGY histograms to tally each of the three quantized marker ratio pairs defined by this collection of k phenotype means, covariances, and for both cytoplasmic and nuclear subcellular regions. To down- prior probabilities of occurrence. weight low-intensity pixels, the contribution of each pixel to the The heterogeneity of a cell population subjected to a specific histogram was weighted by its total intensity of markers. Finally, experimental condition can be estimated by using the reference each histogram was normalized to have unit mass. subpopulation model. For each cell, the (posterior) probability that Taken together, these histograms resulted in a 1,536-dimensional it is a member of a given subpopulation can be computed from the (equal to 6 ϫ 16 ϫ 16) feature vector. These high-dimensional subpopulation model by using Bayes’ rule. These probabilities for feature vectors were subsequently reduced to Ϸ25 dimensions by a single cell and all k subpopulations can be represented as a principal-components analysis (PCA) (Materials and Methods). k-dimensional vector whose entries sum to 1. By averaging these

Slack et al. PNAS ͉ December 9, 2008 ͉ vol. 105 ͉ no. 49 ͉ 19307 Downloaded by guest on September 28, 2021 A control camptothecin colchicine nocodazole drug-treated conditions to controls. A comparison of camptothecin 100%100% (topoisomerase) to controls, for example, showed a significant shift 90% S4 to a dominant phenotype (Fig. 2A; e.g., dose 5, population 3). 80% Comparison of colchicine and nocodazole (both targeting micro- 70% tubules) to controls showed substantial redistributions of pheno- 60% S3 50%50% typic heterogeneity after drug treatment (Fig. 2A; e.g., dose 9, 40% population 2). We next compared the effects of drugs to one S2 30% another. We found that drugs of similar mechanism often yielded

DNA/SC35/anillin 20% similar profiles (e.g., Fig. 2A, colchicine vs. nocodazole). Con- 10% S1 versely, mechanistically different drugs yielded distinct profiles Subpopulation Proportion Subpopulation 0%0% 1357911131 3 5 7 9 11 13 1357911131 3 5 7 9 11 13 1357911131 3 5 7 9 11 13 1357911131 3 5 7 9 11 13 (e.g., Fig. 2A, camptothecin vs. colchicine). Dose Dose Dose Dose responses typically revealed sharp transitions in subpopu- lation proportions as drug concentration increased, whereas un- treated populations tended to remain in constant proportions (Fig. control scriptaid aldosterone docetaxel B 100%100% 2A). Multiphasic drug effects, such as seen for camptothecin, can 90% be easily discerned as dramatic transitions in subpopulation pro- 80% S4 portions [Fig. 2A, second image, transitions at doses 5 and 10 in 70% agreement with previous results (18, 25)]. The interpretation of 60% results across time is complicated somewhat by the fact that controls S3 50%50% did not maintain constant proportions, likely because of varying 40% exposure times to the drug vehicle DMSO before fixation (Fig. 2B, 30% S2 first image). Although most drugs did not show significant differ- 20% DNA/p-p38/p-ERK 10% ences from the controls at the 3- and 6-h time points, significant S1 Subpopulation Proportion Subpopulation 0%0% variation was observed at the 12- and 24-h time points (Fig. 2B). We 3hr36hr 612hr 1224hr 243hr 36hr 612hr 1224hr 243hr 36hr 612hr 1224hr 243hr 36hr 612hr 1224hr 24 note that analysis of automatically identified subpopulations re- Time (hrs) vealed no significant enrichment for any specific cell cycle stage, as Subpopulation proportion might be expected from our choice of pixel-based features (SI Text C and Fig. S2). These observations suggest that subpopulation profiles 0% 20% 40% ≥60% contain information that may offer insights into drug modes of S4 action. We next wondered whether the significant redistributions of S3 subpopulations were consistent across drug perturbations of similar S2 mechanism. To assess the ‘‘information content’’ of observed patterns of heterogeneity, we computed subpopulation profiles for Subpopulation S1

DNA/p-p38/p-ERK our 25 annotated drugs at the 24-h time point and grouped the results by category of mechanism (Fig. 2C). (Here, we visualized these results using a heatmap, rather than stacked bar graphs, to

M344 (H) M344 provide easier visual comparison among larger numbers of drugs.) RU486 (G) RU486 cortisol (G) (H) MS-275 (H) MS-275 scriptaid (H) arctigenin (T) etoposide (T) paclitaxel (M)

docetaxel (M) Whereas DNA replication-inhibiting drugs showed high variability colchicine (M) aphidicolin (D) ganciclovir (D) vinorelbine (M) doxorubicin (T) nocodazole (M) aldosterone (G) trichostatin A (H) prednisolone (G) methotrexate (D) camptothecin (T) *mitomycin (D) C in their subpopulation proportions, the responses of drugs within all boswellic acid (T) *trifluoperazine (D)

dexamethasone (G) other categories generally showed high concordance. Rigorous Fig. 2. Subpopulation profiles allow quantitative comparison of heteroge- assessment of the similarities and differences between drug effects neous cellular responses. (A) Subpopulation profiles reveal dose-dependent requires a quantitative measure of ‘‘distance’’ between subpopula- heterogeneous responses to drugs [DNA/SC35/anillin marker set, 20-h drug tion profiles. When considered as probability vectors, subpopula- exposure (25)]. Thirteen concentrations are shown, increasing from left to tion profiles may be viewed as information theoretic descriptions of right, at 3-fold titration. (B) Subpopulation profiles reveal time-dependent cell population heterogeneity. Similarities among profiles can be heterogeneous responses to drugs. Subpopulation profiles for selected drugs quantitatively compared by using the Kullback–Leibler (KL) di- are combined in time series (DNA/p-p38/p-ERK marker set; concentrations are vergence (34) and visually displayed by using multidimensional given in Table S1). Profiles in A and B are based on reference models computed scaling (MDS) plots (SI Text). We observed that profiles showed ϭ with k 4 subpopulations. Colors and subpopulations ids S1-S4 on left-most significant grouping by drug mechanism (Fig. 3A, k ϭ 4). Thus, images correspond to Fig. 1C.(C) Drugs of similar mechanism often induce similar subpopulation profiles. Subpopulation profiles for 25 drugs at the 24-h drugs of similar mechanism appear to induce similar patterns of time point and DNA/p-p38/p-ERK marker set are displayed in a heat map and cellular heterogeneity. grouped by category (D, DNA replication; H, ; M, micro- A natural question arose from our analysis: Would increasing the tubule; G, glucocorticoid receptor; T, topoisomerase). Drugs with Ͻ200 cells number of subpopulations, k, in the reference model discern finer per well are marked with asterisks (mitomycin C and trifluoperazine). distinctions between drug effects and thus yield a more accurate classification of drugs? In theory, increasing k allows finer resolu- tion at which to measure cellular heterogeneity, although classifi- k-dimensional vectors over a population of cells, a subpopulation cation accuracy for very large values of k may fail to improve—or profile is obtained as the expected overall proportion of each even degrade—because of the introduction of nonbiological sep- subpopulation for a given experimental condition. Replicates pro- aration of cellular phenotypes. We found for the DNA/p-p38/p- duced reproducible subpopulation profiles (Fig. S1), which were ERK and DNA/actin/␣-tubulin marker sets that categorization averaged to obtain 1 final profile per condition. Thus, the hetero- performance for most drug categories sharply improved as k geneity observed within any experimental condition can be con- increased from 1 (no predictive power) to Ϸ5. Afterward, no cisely encoded by a probability distribution over the reference significant improvement in classification was observed (Fig. 3B). In subpopulations. a few cases, categorization performance was poor. For example, the We next examined the degree to which subpopulation profiles DNA/actin/␣-tubulin marker set showed poor performance for the can be used to distinguish heterogeneous responses to different microtubule category (Fig. 3B Middle). However, as previously drugs (Fig. 2A). We first compared subpopulation profiles for shown, this poor performance was due to the fact that the micro-

19308 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0807038105 Slack et al. Downloaded by guest on September 28, 2021 1 - apicidin A k = 2 k = 4 k= 10 k = 100 2 - M344 3 - MS-275 1 1 1 11 1 4 - scriptaid 13 13 5 - trichostatin A 14 13 11 5 15 15 11 12 15 6 - colchicine 14 12 3 7 - docetaxel 8 11 14 3 6 1 3 10 8 - nocodazole 12 14 0 6 0 0 10 0 9 - paclitaxel 5 6 5 10 - vinorelbine 7 2 13 3 10 8 2 8 9 11 - aldosterone 9 5 7 7 PC 2 (A.U.) 10 4 7 4 12 - cortisol 15 2 9 6 8 4 12 4 1 1 9 13 - dexamethasone DNA/p-p38/p-ERK 14 - prednisolone 2 1 -1 -1 -1 -1 15 - RU486 -1 0 1 -1 0 1 -1 0 1 -1 0 1 PC 1 (A.U.)

B DNA replication Other categories 5 DNA/p-p38/p-ERK 5 DNA/actin/α-tubulin Histone deacetylase Microtubule destabilizers Microtubule Microtubule stabilizers Glucocorticoid receptor 4 4 Topoisomerase 1 6 8 10 3 3

2 2 0 Performance

A.U. 7 1 1 9

-1 12345678910 15 20 30 4050 100 12345678910 15 20 30 4050 100 -1 0 1 A.U. Number of subpopulations (k)

Fig. 3. A small number of phenotypic subpopulations can distinguish drug effects by mechanism. (A) As the number of subpopulations in the reference model increases (k ϭ 2, 4, 10, and 100), MDS plots (40) reveal relative similarities among drugs (computed by using KL divergence) within 3 mechanism categories (H, blue diamonds; M, green triangles; G, orange stars). Axes are labeled in arbitrary units. (B) Drug classification performance is computed for each value of k (and drug category; see also Table S5). Performance is measured by Ϫlog10 of the P value for a rank-sum statistic [Mann–Whitney U test (39)] obtained by comparing distributions of intra- and intercategory subpopulation profile similarity scores. Profile similarity is computed for 12- and 24-h time points of low drug concentration and combined by rank addition. Dotted lines indicate significance threshold corresponding to P ϭ 0.05. Black arrowheads (Left)(k ϭ 2, 4, 10, 100) correspond to MDS plots in A. Black arrowheads (Right)(k ϭ 7) correspond to MDS plot at bottom right and uses reference model shown in Fig. S1D.

tubule category was too coarse (18); in fact, subpopulation profiles recomputed model fit using a new reference model based on data did distinguish between microtubule-stabilizing taxanes (paclitaxel sampled from controls and only 5 low-concentration drug treat- and docetaxel) and drugs that destabilize microtubules (colchicine, ments, 1 from each category of mechanism (Fig. 4A). vinorelbine, and nocodazole) (Fig. 3B Right; Fig. S1D). Thus, a Examining the 6 drugs whose model fit scores were lowest small number of subpopulations were sufficient to give accurate revealed 2 types of exceptional phenotypes (Fig. 4B): (i) microtu- classification for most drug categories and both marker sets. bule-destabilizing drugs (colchicine, nocodazole, vinorelbine) We extended our analysis to include the additional 10 drugs of whose phenotypes were similar to each other but different from unspecified or miscellaneous categorization (Table S1). MDS plots most of the remaining drugs; and (ii) 3 drugs at high concentrations for the DNA/p-p38/p-ERK marker set showed these additional (doxorubicin, green tea-extracted polyphenols, methoxyacetic acid) drugs to be interspersed with glucocorticoid receptor drugs (Fig. whose phenotypes were visually much different from those seen for S3, k ϭ 4). We hypothesized that glucocorticoid receptor ligand-like any other drug treatment condition. Although colchicine-treated effects would be more likely to be observed for compounds more cells were included in the reference model, the phenotypes pre- proximal to drugs in this category. Of 4 proximal compounds tested, sented at the 24-h time point were in too low a proportion to 2 induced cytoplasmic-to-nuclear translocation of a GFP-tagged constitute 1 of the 4 modeled subpopulations; the other exceptional glucocorticoid receptor (35) in COS7 cells (green tea polyphenols phenotypes were likely not present in the reference sample. Drugs and valproic acid), whereas 2 did not ( and azelaic that induced poorly modeled heterogeneous response in one acid) (Fig. S4). Neither of the more distal compounds tested marker set, tended to also do so in the other marker set. Depending (mitoxantrone and methoxyacetic acid) induced glucocorticoid on the application, there are a variety of (supervised and unsuper- receptor translocation. Of note, the HDAC drug MS-275 closest to vised) strategies for updating or rebuilding a reference subpopu- the glucocorticoid receptor group also induced translocation, lation model with exceptional phenotypes, such as iterative iden- whereas the less proximal HDAC drug trichostatin A did not (data tification and merging of novel phenotypes in large-scale screens not shown). Interestingly, these insights were derived by using a (30). Phenotypic complexity within a collection of perturbations general signaling marker set, not specific to mechanisms of nuclear can thus be estimated in a relative sense (i.e., the number of receptor regulatory pathways. These results suggest that analyzing subpopulations required to distinguish the perturbations) or in an patterns of cellular heterogeneity may be useful for generating absolute sense (i.e., the total number of distinct subpopulations). mechanistic insights into modes of drug action. Although our approach effectively captured phenotypic stereo- Discussion

types common to the entire collection of drug perturbations, we Phenotypic heterogeneity is often ignored or viewed as an imped- CELL BIOLOGY wondered to what extent exceptional (i.e., rare or unmodeled) iment to understanding the effects of perturbations on cellular phenotypes may have been missed (30). To test for exceptional populations. Traditional measurements of a mean and a variance phenotypes in our dataset, we evaluated goodness of model fit for typically do not capture the complexity of cellular responses to all 35 of our drugs and all treatment conditions based on the ratio perturbations often observed in microscopy. On the other hand, of log likelihoods for drug treatment vs. the reference training data cell-to-cell variation suggests a potential for limitless cellular com- (Materials and Methods). We found that only a relatively small plexity. Analysis of our datasets suggested that the effects of number of drug treatment conditions did not fit the reference perturbations may be characterized largely as redistributions of a model reasonably well. To ensure that results were not biased by the limited, but nontrivial, repertoire of underlying states. Subpopula- inclusion of all 35 drug treatments in our original model, we tion profiles provided an intuitive and computationally tractable

Slack et al. PNAS ͉ December 9, 2008 ͉ vol. 105 ͉ no. 49 ͉ 19309 Downloaded by guest on September 28, 2021 3 hr 6 hr 12 hr 24 hr 3 hr 6 hr 12 hr 24 hr 3 hr 6 hr 12 hr 24 hr 3 hr 6 hr 12 hr 24 hr A control a a 3-O-acetyl-B-boswellic acid aldosterone aphidicolin apicidin arctigenin azelaic acid camptothecin colchicine 1 1 compound A b b cortisol dichloroacetic acid dexamethasone docetaxel doxorubicin 2 2 etoposide c c ganciclovir gemcitabine green tea extracted polyphenols 3 3 homocysteine thiolactone M344 methotrexate methoxyacetic acid 4 4 mitomycin C mitoxantrone ≥1 (good) MS-275 nicotinamide nocodazole 5 5 prednisolone RU486 scriptaid paclitaxel trichostatin A

Model Fit Score 0 (poor) trifluoperazine valproic acid <700 cells/well vinorelbine 6 6 noitartnecnoc woL noitartnecnoc hgiH artnecnoc t noi woL noitartnecnoc hgiH noitartnecnoc Fig. 4. Exceptional phenotypes are DNA/p-p38/p-ERK DNA/actin/α-tubulin detected by evaluating the fit of each drug to a reference model built by B a bc a bc using only the low concentration of 1 drug in each functional category (con- trol, aldosterone, aphidicolin, apici- din, colchicine, and doxorubicin) at all time points. (A) Reference models for DNA/p-p38/p-ERK and DNA/actin/␣- tubulin marker sets are both com- 1 23 1 23 puted with k ϭ 4 subpopulations. Heat map reflects the ratio of model fit (log likelihood) of drug treatment data to training data. Four time points (3, 6, 12, and 24 h) are displayed for each of 2 marker sets, at both low and high concentrations of each drug. 456 456 Gray boxes indicate poor fits due to low cell counts (Ͻ700 cells per well). (B) Shown are sample images of drug treated cells that fit the reference model well (a–c), as well as the 6 lowest scoring drugs (1–6) as indicated in A.

method for identifying functional significance from patterns of mated grouping based on phenotypic features may not agree with heterogeneity. Analysis of subpopulation profiles revealed that visual grouping. Model parameters in future studies of heteroge- drugs of (dis-)similar mechanism induced (dis-)similar patterns of neity may be optimized to improve visual tightness of identified subpopulation redistributions. Thus, our work suggests that pat- subpopulation phenotypes. terns of heterogeneity can be quantified and used to extract Our analysis suggests that observed heterogeneity may reflect meaningful biological information. variation around a discrete set of phenotypic states. Our work Our approach for grouping cells is unsupervised, and the iden- provides a framework for future studies with larger sample sizes, tification of subpopulations depends on the composition of the greater numbers of perturbations, and higher numbers of markers dataset, the markers, the extracted features, and the number of to rigorously test questions about the nature of heterogeneity, subpopulations. In some cases, subpopulation phenotypes and their including: Do unperturbed cells occupy essentially the same phe- redistributions may be interpretable. For example, the microtubule- notypic space as cells with significantly perturbed pathways? And, destabilizing drugs vinorelbine and nocodazole induced a decrease do perturbations induce redistributions among a limited repertoire in subpopulations characterized by ␣-tubulin-positive staining. Our of underlying stereotypical states? We envision that the ability to approach also identifies subpopulations whose phenotypes cannot identify and interpret information contained in patterns of cellular be easily interpreted by current biological knowledge. For example, heterogeneity will provide insights into physiology and disease that it is unclear how aldosterone induces an increase in subpopulation may be missed by traditional population-averaged or small-cell- 3, characterized by weak diffused p-ERK staining concomitant with number studies. clear p-p38 nuclear staining. In general, the best method for interpreting subpopulations and their redistributions may be Materials and Methods through relative comparisons with profiles of perturbations with Drugs and Concentrations. The drug-dose data of 100 compounds in 13 3-fold known mechanisms. serial dilution are as previously reported (25). The new time-course data consisted GMMs provide a reasonable first approximation for grouping of 35 compounds at 2 concentrations (see Table S1). cells in high-dimensional feature space. However, certain subpopu- Cell Culture. For the time-course data, HeLa cells were diluted to 5,000 cells per lations may still display some degree of phenotypic heterogeneity 50 ␮l of media and plated onto 384-well glass-bottom plates. Plates were incu- (e.g., Fig. 1C Upper, subpopulation 2). This may be due to several bated at room temperature for 30 min to minimize edge effects in wells (36), then factors, including: Not all subpopulations will be well modeled by placed in a 37 °C/5% CO2 incubator overnight. Cells were fixed at 3, 6, 12, and 24 h Gaussian distributions; some subpopulations may cover large re- after drug treatment. See SI Text for details about cell culture, drug treatment, gions of feature space and need further subdivisions; and auto- and plate layout.

19310 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0807038105 Slack et al. Downloaded by guest on September 28, 2021 Fluorescent Markers. The marker set for the drug-dose data (DNA/SC35/anillin) models were computed for each number of subpopulations clusters in the set are as previously reported (25). For the time-series data, 2 sets of fluorescent (2,3,4,…,18,19,20,30,40,50,100). markers were used: (DNA/p-p38/p-ERK) and (DNA/actin/␣-tubulin). See SI Text for details about staining protocol. Subpopulation Profiles. Subpopulation profiles for each drug-treatment repli- cate (single well) were computed by averaging the posterior probabilities for the Image Background Correction and Cell Segmentation. Image background cor- corresponding subpopulation model for 1,000 samplings (with replacement) rection was done by using the ImageJ background subtraction rolling-ball algo- consisting of PCA reduced feature data from 1,000 cells per sample. Profiles for rithm (37), with rolling-ball size set to 500 pixels. Original image dimensions were each drug-treatment condition were computed by an average of profiles for each 1,392 ϫ 1,040 pixels. Cell segmentation methods used to define DNA and non- replicate well, averaged by the number of cells observed in each well. When DNA (cytoplasmic) regions are as in ref. 18 and are summarized in SI Text. exactly 1 of the 2 replicate wells had Ͻ1,000 cells in total, the profile for the other well was used. Feature Extraction. For each cell, a 1,536 dimensional feature vector was com- puted based on histograms of the ratios of marker intensities per pixel, in both Similarity Comparison. Individual profiles were compared by using a symme- cytoplasmic and nuclear cellular regions. See Results and SI Text. trized version of Kullback-Leibler divergence (34). In particular, given subpopu- lation profiles P and Q for a model with k subpopulations, the similarity measure Feature Data Sampling. Feature data from multiple wells was drawn by using S(P,Q) is defined as sampling with replacement, with weights assigned to each well according to the proportionate number of total cells detected within the well. The total number KL͑P, Q͒ ϩ KL͑Q, P͒ S͑P, Q͒ ϭ , [1] of cells (training data) sampled from all plates for computing PCA coordinate 2 transformations and subpopulation reference models was 10,000 (Ϸ2% of cells ␣ Ϸ for DNA/p-p38/p-ERK and DNA/actin/ -tubulin marker sets, and 0.06% of cells where for DNA/SC35/anillin marker set). For the DNA/p-p38/p-ERK and DNA/actin/␣- tubulin marker sets, all controls and low concentration drug treatments at all 4 KL͑P, Q͒ ϭ ͸k P log ͑P /Q ͒. [2] time points were sampled. For the DNA/SC35/anillin marker set, all wells were iϭ1 i 2 i i sampled. MDS Plots. The Matlab mdscale function was used to compute data for multidi- mensional scaling plots (40). Kruskal’s stress-1 metric (total error in similarity PCA Coordinate Transformations. Principal-components analysis (PCA) was used representation) was computed to be Ͻ0.05 for each plot in Fig. 3A, and Ϸ0.12 for to reduce the dimensionality of feature data from 1,536 to 27 for the DNA/p- the plot in Fig. 3B. p38/p-ERK marker set, and to 26 for the DNA/actin/␣-tubulin marker set. The choice of dimension was made for each marker set by randomizing the order of feature dimensions for each sampled cell and computing the eigenvalues of the Identification of Exceptional Phenotypes. For a given drug treatment condition, resulting (randomized feature) covariance matrix. Subsequent modeling and the model fit score was computed as the ratio of the model log likelihoods for profiling computations all used the PCA reduced data. drug treatment data vs. reference training data. Samplings of 1,000 cells were used to compute log likelihoods. Reference Models. Subpopulation reference GMM models were computed from sampled feature data by using the EM algorithm (33). Code was based on ACKNOWLEDGMENTS. We thank Michael Johnson (Nikon) and Jurg Rohrer (BD Biosciences), all members of the S.J.A. and L.F.W. laboratory for extremely helpful the publicly-available Netlab Toolbox. For each model, EM clustering was run discussions and feedback, and the anonymous reviewers for their helpful sug- 10 times, starting from a K-means clustering (38) using randomly chosen gestions. This work was supported by National Institutes of Health Grants R01 means. The final clustering with the best log-likelihood value was chosen as GM081549 (to L.F.W.), R01 GM071794 (to S.J.A.), K22 CA118717-01 (to E.D.M.), the subpopulation reference model. To eliminate the influence of conver- and P50 CA70907 UT SPORE in Lung Cancer; the Welch Foundation (I-1619 and gence failures, each run was attempted up to 5 times with new initial condi- I-1644 to L.F.W. and S.J.A.); and the University of Texas Southwestern Endowment tions until convergence was reached. For each marker set, subpopulation for Scholars in Biomedical Research (to L.F.W. and to S.J.A.).

1. Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S (2008) Transcriptome-wide 21. Neumann B, et al. (2006) High-throughput RNAi screening by time-lapse imaging of noise controls lineage choice in mammalian progenitor cells. Nature 453:544–547. live human cells. Nat Methods 3:385–390. 2. Bahar R, et al. (2006) Increased cell-to-cell variation in gene expression in ageing mouse 22. Newberg J, Murphy RF (2008) A framework for the automated analysis of subcellular heart. Nature 441:1011–1014. patterns in human protein atlas images. J Proteome Res 7:2300–2308. 3. Bar-Even A, et al. (2006) Noise in protein expression scales with natural protein 23. O’Brien P, Haskins JR (2007) In vitro cytotoxicity assessment. Methods Mol Biol 356:415–425. abundance. Nat Genet 38:636–643. 24. Pepperkok R, Ellenberg J (2006) High-throughput fluorescence microscopy for systems 4. Colman-Lerner A, et al. (2005) Regulated cell-to-cell variation in a cell-fate decision biology. Nat Rev Mol Cell Biol 7:690–696. system. Nature 437:699–706. 25. Perlman ZE, et al. (2004) Multidimensional drug profiling by automated microscopy. 5. Ferrell JE, Jr, Machleder EM (1998) The biochemical basis of an all-or-none cell fate Science 306:1194–1198. switch in Xenopus oocytes. Science 280:895–898. 26. Perrimon N, Mathey-Prevot B (2007) Applications of high-throughput RNA interfer- 6. Geva-Zatorsky N, et al. (2006) Oscillations and variability in the p53 system. Mol Syst ence screens to problems in cell and developmental biology. Genetics 175:7–16. Biol 2:2006.0033. 27. Sonnichsen B, et al. (2005) Full-genome RNAi profiling of early embryogenesis in 7. Raser JM, O’Shea EK (2004) Control of stochasticity in eukaryotic gene expression. Caenorhabditis elegans. Nature 434:462–469. Science 304:1811–1814. 28. Tanaka H, et al. (2007) Lineage-specific dependency of lung adenocarcinomas on the 8. Raser JM, O’Shea EK (2005) Noise in gene expression: origins, consequences, and lung development regulator TTF-1. Cancer Res 67:6007–6011. control. Science 309:2010–2013. 29. Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP (2005) Causal protein-signaling 9. Samadani A, Mettetal J, van Oudenaarden A (2006) Cellular asymmetry and individ- networks derived from multiparameter single-cell data. Science 308:523–529. uality in directional sensing. Proc Natl Acad Sci USA 103:11549–11554. 30. Yin Z, et al. (2008) Using iterative cluster merging with improved gap statistics to 10. Campbell LL, Polyak K (2007) Breast tumor heterogeneity: cancer stem cells or clonal perform online phenotype discovery in the context of high-throughput RNAi screens. evolution? Cell Cycle 6:2332–2338. BMC Bioinformatics 9:264. 11. Zhang M, Rosen JM (2006) Stem cells in the etiology and treatment of cancer. Curr Opin 31. Plasier B, Lloyd DR, Paul GC, Thomas CR, Al-Rubeai M (1999) Automatic image analysis Genet Dev 16:60–64. for quantification of apoptosis in animal cell culture by annexin-V affinity assay. 12. Gascoigne KE, Taylor SS (2008) Cancer cells display profound intra- and interline J Immunol Methods 229:81–95. variation following prolonged exposure to antimitotic drugs. Cancer Cell 14:111–122. 32. Wang M, Zhou X, King RW, Wong ST (2007) Context based mixture model for cell phase 13. Balaban NQ, Merrin J, Chait R, Kowalik L, Leibler S (2004) Bacterial persistence as a identification in automated fluorescence microscopy. BMC Bioinformatics 8:32. phenotypic switch. Science 305:1622–1625. 33. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the

14. Chen X, Velliste M, Murphy RF (2006) Automated interpretation of subcellular patterns EM algorithm. J R Stat Soc 39:1–38. CELL BIOLOGY in fluorescence microscope images for location proteomics. Cytometry A 69:631–640. 34. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86. 15. Conrad C, et al. (2004) Automatic identification of subcellular phenotypes on human 35. Htun H, Barsony J, Renyi I, Gould DL, Hager GL (1996) Visualization of glucocorticoid cell arrays. Genome Res 14:1130–1136. receptor translocation and intranuclear organization in living cells with a green 16. Echeverri CJ, Perrimon N (2006) High-throughput RNAi screening in cultured cells: A fluorescent protein chimera. Proc Natl Acad Sci USA 93:4845–4950. user’s guide. Nat Rev Genet 7:373–384. 36. Lundholt BK, Scudder KM, Pagliaro L (2003) A simple technique for reducing edge 17. Johnson RL, et al. (2008) A quantitative high-throughput screen identifies potential effect in cell-based assays. J Biomol Screen 8:566–570. epigenetic modulators of gene expression. Anal Biochem 375:237–248. 37. Sternberg S (1983) Biomedical image processing. IEEE Comput 16:22–34. 18. Loo LH, Wu LF, Altschuler SJ (2007) Image-based multivariate profiling of drug re- 38. Kaufman L, Rousseeuw PJ (1990) Finding Groups in Data: An Introduction to Cluster sponses from single cells. Nat Methods 4:445–453. Analysis (Wiley, New York). 19. Matsuyama A, et al. (2006) ORFeome cloning and global analysis of protein localization 39. Mann HB, Whitney DR, (1947) On a test of whether one of two random variables is in the fission yeast Schizosaccharomyces pombe. Nat Biotechnol 24:841–847. stochastically larger than the other. Ann Math Stat 18:50–60. 20. Mora-Bermudez F, Ellenberg J (2007) Measuring structural dynamics of chromosomes 40. Borg I, Groenen P (1997) Modern Multidimensional Scaling: Theory and Applications in living cells by fluorescence microscopy. Methods 41:158–167. (Springer, New York).

Slack et al. PNAS ͉ December 9, 2008 ͉ vol. 105 ͉ no. 49 ͉ 19311 Downloaded by guest on September 28, 2021