Characterizing Heterogeneous Cellular Responses to Perturbations
Total Page:16
File Type:pdf, Size:1020Kb
Characterizing heterogeneous cellular responses to perturbations Michael D. Slacka,b, Elisabeth D. Martineza,c, Lani F. Wua,b,1, and Steven J. Altschulera,b,1 aDepartment of Pharmacology, bGreen Center for Systems Biology, Simmons Cancer Center, and the cHamon Center for Therapeutic Oncology Research, University of Texas Southwestern Medical Center, Dallas, TX 75390 Edited by Alfred G. Gilman, University of Texas Southwestern Medical Center, Dallas, TX, and approved October 16, 2008 (received for review July 20, 2008) Cellular populations have been widely observed to respond hetero- rithms to identify phenotypic ‘‘stereotypes’’ within the overall geneously to perturbation. However, interpreting the observed het- population and assign probabilities to cells belonging to subpopu- erogeneity is an extremely challenging problem because of the lations modeled on these stereotypes. Subpopulation identification complexity of possible cellular phenotypes, the large dimension of is based on application of a Gaussian mixture model (GMM), which potential perturbations, and the lack of methods for separating has been applied recently in the context of online phenotype meaningful biological information from noise. Here, we develop an discovery (30). Finally, for each population (or condition), we image-based approach to characterize cellular phenotypes based on estimated the fraction of cells in each subpopulation and summa- patterns of signaling marker colocalization. Heterogeneous cellular rized the results as a probability vector. The resulting ‘‘subpopula- populations are characterized as mixtures of phenotypically distinct tion profiles’’ are human- and machine-interpretable and enable the subpopulations, and responses to perturbations are summarized comparison of heterogeneous responses across multiple physiolog- succinctly as probabilistic redistributions of these mixtures. We apply ical, pathological, or environmental perturbations. our method to characterize the heterogeneous responses of cancer To develop our approach, we analyzed existing and new image cells to a panel of drugs. We find that cells treated with drugs of databases of drug-treated HeLa cells. Drug treatments provided (dis-)similar mechanism exhibit (dis-)similar patterns of heterogene- reliable and well-characterized mechanistic classes of perturbations ity. Despite the observed phenotypic diversity of cells observed that could be varied in both time and dose. Furthermore, existing within our data, low-complexity models of heterogeneity were suf- drug classifications provided a biologically motivated method for ficient to distinguish most classes of drug mechanism. Our approach estimating a crucial property of cellular heterogeneity. Namely, offers a computational framework for assessing the complexity of whether increasing numbers of subpopulations provided improved cellular heterogeneity, investigating the degree to which perturba- ability to distinguish different mechanistic classes of perturbations. tions induce redistributions of a limited, but nontrivial, repertoire of [Traditionally, the tradeoffs between model fit and complexity are underlying states and revealing functional significance contained addressed by a nonbiological criterion, such as by Bayesian Infor- within distinct patterns of heterogeneous responses. mation Criterion (B.I.C.) or gap statistics (30).] Thus, drug pertur- bations and their classifications provided an unbiased criterion for automated microscopy ͉ cellular heterogeneity ͉ image analysis determining the degree of complexity required to interpret patterns of cellular heterogeneity. henotypic heterogeneity in cellular populations has been ob- Pserved in diverse physiological processes (1–9), pathophysio- Results logical conditions (10, 11), and responses to therapeutics (10, 12, To study the heterogeneous effects of perturbations on cells, we 13). However, interpreting cellular heterogeneity—and its changes made use of an existing image database of HeLa cells treated with in response to perturbations—has been an extremely challenging 100 drugs at 13 concentrations and stained by using a multiplexed problem. Does heterogeneity contain biologically (or clinically) collection of immunofluorescent markers (DNA/SC35/anillin) important information? If so, what level of resolution is required to (25). However, this dataset did not contain temporal information extract this information? and additionally lacked consistent numbers of representative drugs The complexity of cellular heterogeneity is not yet well under- in several key categories of drug mechanism. Thus, we created a stood: At one extreme, heterogeneity may reflect biological ‘‘noise’’ 25-drug dataset capturing the responses of HeLa cells to drugs at about a single ‘‘average’’ phenotypic state, whereas at the other varying concentrations [low or high, as specified in supporting extreme, it may reflect unbounded complexity, with each cell information (SI) Table S1] and time points (3, 6, 12, or 24 h). For representing a distinct state. An intermediate alternative—that we this dataset, we selected 5 drugs in each of 5 known functional investigate here—is that cellular populations can be well described categories, affecting DNA replication (e.g., methotrexate), histone as mixtures of a limited number of phenotypically distinct subpopu- deacetylation (e.g., trichostatin A), microtubules (e.g., paclitaxel), lations. A practical challenge is to determine the complexity of the glucocorticoid receptors (e.g., dexamethasone), and topoisom- observed cellular phenotypes and develop methods for identifying erase (e.g., doxorubicin). We additionally included an ‘‘other’’ patterns of cellular heterogeneity. category containing 10 drugs of miscellaneous (e.g., dichloroacetic Here, we developed methods for characterizing patterns of acid) or unspecified (e.g., green tea-extracted polyphenols) mech- spatial heterogeneity observed within cellular populations. To capture cellular heterogeneity, we made use of high-content image- based assays, an increasingly powerful approach for capturing Author contributions: M.D.S., E.D.M., L.F.W., and S.J.A. designed research; M.D.S. and E.D.M. performed research; M.D.S. analyzed data; and M.D.S., E.D.M., L.F.W., and S.J.A. cellular responses to perturbations and facilitating the objective wrote the paper. quantification of cellular phenotypes (14–28). [An interesting al- The authors declare no conflict of interest. ternative—not considered here—is to study heterogeneity captured This article is a PNAS Direct Submission. by fluorescence-activated cell sorting (FACS), which provides 1To whom correspondence may be addressed at: Department of Pharmacology, University increased numbers of readouts per cell, but without spatial reso- of Texas Southwestern Medical Center, 6001 Forest Park Road, ND9.214, Dallas, TX 75390. lution (29)]. First, we applied image-processing algorithms to E-mail: [email protected] or [email protected]. extract phenotypic measurements of the activation and colocaliza- This article contains supporting information online at www.pnas.org/cgi/content/full/ tion patterns of cellular readouts from large numbers of cells in 0807038105/DCSupplemental. diverse conditions. Next, we used unsupervised clustering algo- © 2008 by The National Academy of Sciences of the USA 19306–19311 ͉ PNAS ͉ December 9, 2008 ͉ vol. 105 ͉ no. 49 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0807038105 Downloaded by guest on September 28, 2021 C Reference subpopulation models Fig. 1. Heterogeneous cellular populations can be characterized as Feature space Subpopulations Image database mixtures of subpopulations with dis- A 100% tinct phenotypes. For convenience of 1 90% S4 visualization, results are shown for 80% k ϭ 4 subpopulations. (A and B) Illus- 70% tration of approach. (A) HeLa cells are 0.5 S3 60% treated with drugs, labeled with flu- 50% orescence markers, and imaged. Indi- PC 2 40% 0 S2 vidual cellular regions are identified. 30% (B) A high-dimensional vector of phe- 20% notypic features is extracted from -0.5 10% S1 each cell. Cells are assigned probabil- 0% ities of belonging to subpopulations -0.5 0 0.5 1 DNA/SC35/anillin that have been identified by using a PC 1 100% DNA/SC35/anillin 1 GMM. Scatter plot: cells are visual- 90% S4 ized as points via feature representa- 80% tion and PCA reduction to two di- 0.5 70% mensions; colored cell outlines colors 60% S3 in A and points in B correspond to 0 50% most probable subpopulation assign- 40% ment; ellipses represent unit stan- Feature Space S2 dard deviation from the mean for B 30% -0.5 each subpopulation. Shown are 3 20% representative cells taken from vari- 10% S1 -1 ous locations in each ellipse. (C) PC 2 0% -1 -0.5 0 0.5 1 DNA/p-p38/p-ERK Shown are subpopulation models for 100% different marker sets. Models are 1 built from ‘‘reference’’ sets of cells 90% S4 sampled from multiple conditions. 80% Scatter plots are as in B. Stacked bar 0.5 70% graphs represent (prior) probabilities 60% S3 of cell membership in each subpopu- 0 50% lation. Sample cells from within 1 40% S2 standard deviation of the mean phe- -0.5 30% notype are shown at right for each PC 1 20% subpopulation (image colors corre- 10% S1 spond to markers in order; e.g., blue/ -1 0% green/red corresponds to DNA/SC35/ -1.5 -1 -0.5 0 0.5 α DNA/actin/ -tubulin anillin). (Scale bar: 20 m.) anism. After fixation, cells were stained with 1 of 2 sets of Although alternative choices of cellular features could certainly be fluorescent markers