Innovations in 21St Century Systems Biology

Leading Edge Essay

Network News: Innovations in 21st Century Systems Biology

Adam P. Arkin1,4,* and David V. Schaffer1,2,3,4 1Department of Bioengineering 2Department of Chemical and Biomolecular Engineering 3The Helen Wills Neuroscience Institute University of California, Berkeley, Berkeley, CA 94720, USA 4Physical Biosciences Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.008

A decade ago, seminal perspectives and papers set a strong vision for the field of systems biology, and a number of these themes have flourished. Here, we describe key technologies and insights that have elucidated the evolution, architecture, and function of cellular networks, ultimately leading to the first predictive genome-scale regulatory and metabolic models of organisms. Can systems approaches bridge the gap between correlative analysis and mechanistic insights?

System biology aims to understand how American mathematician Norbert Weiner, numerous fundamental principles came individual elements of the cell interact to along with his coauthors, proposed that to light. These included the possible generate behaviors that allow survival in negative feedback loops would be central mechanisms and advantages of different changeable environments and collective to maintaining this stability in biological biochemical switches and oscillators cellular organization into structured systems (Rosenbleuth et al., 1943), thus with and without biochemical noise communities. Ultimately, these cellular linking concepts of control and optimality (Goodwin, 1963); new models of meta- networks assemble into larger population with biological dynamics. Ten years later, bolic control and engineering (Heinrich networks to form large-scale ecologies the British developmental biologist Con- and Rapoport, 1974; Kacser and Burns, and thinking machines, such as humans. rad Waddington laid some of the modern 1973); the reverse engineering of cellular Given this central focus on codifying the foundation for systems biology when he networks (Bekey and Beneken, 1978); organizational principles and algorithms presciently conceptualized networks of and abstracted models of these networks of life, we argue that systems biology is cellular components (i.e., genes, cells, to understand the evolution and optimiza- not a newly emerging field, but rather and tissues) as evolutionarily dynamical tion of specific network ‘‘designs’’ (Kauff- a mature synthesis of thought about the systems expressible as solutions to man, 1969). Indeed, these latter principles implications of biological structure and a series of simultaneous differential equa- of how networks can be structured to its dynamic organization, ideas that have tions. Over his long career, Waddington achieve particular functions have been been brewing for more than a century. argued for a truly dynamic systems theory used more recently to explicitly predict To many scientists, the beginning of the of cellular decision making driven by gene natural network behavior. last decade marked the definition and expression and epigenetics (Waddington, Thus, by the early 1970s, the concepts rise of the field of systems biology. 1954, 1977). When Jacques Lucien Jacob and components were all in place However, systems biology’s conceptual and François Monod unveiled the molec- for what encompasses most of what we origins date back almost 100 years. In ular mechanisms of gene regulation in call ‘‘systems biology’’—the integrated 1917, D’Arcy Thompson formalized the 1962, they noted, ‘‘it is obvious from the molecular analysis of cellular networks. first link between development, evolution, analysis of these [bacterial genetic regula- However, one roadblock remained: and physics in his treatise On Growth and tory] mechanisms that their known experimental data to support the models Form, when he observed that shapes elements could be connected into and hypotheses. This is where the last and function of biological systems were a wide variety of ‘circuits’ endowed with two decades have revolutionized the field fundamentally determined by physical any desired degree of stability’’ (Jacob of cellular network inference and analysis. requirements and mechanical laws. In and Monod, 1962). Since the early 1990s, a vast array of 1939, Walter Canon, then chairman of During the ensuing decade, scientists technologies has dramatically improved the Department of Physiology at Harvard across a wide array of disciplines started the efficiency of manipulating cells genet- Medical School, coined the term ‘‘homeo- exploring the nonlinear dynamics in ically, the measurement of cellular compo- stasis’’ when he noted that organisms biochemical networks. Although experi- nents at high precision and completeness, hold essential physiological variables at mental data to support their theoretical and the dissemination of materials and constant values despite a fluctuating hypotheses were still largely missing, information at unprecedented speeds environment (Canon, 1939). In 1943, the this period was quite productive, as (due to the other network revolution, which

844 Cell 144, March 18, 2011 ª2011 Elsevier Inc. distantly removed from causation; indeed, the quadrants are connected. However, when we asked a group of colleagues which systems biology papers over the last decade have been most important to the field, the resulting set of landmark studies naturally clustered into different regions of this systems biology ‘‘plane’’ (Figure 1). Correlative Approaches Genome-scale data have fundamentally changed the types of questions that we ask about cellular systems. We can now observe how genomes dynamically change expression in response to environmental conditions and then correlate these results to other phenotypes, such as growth, fate choices, and biosynthetic productivity. Such experiments have inspired several classes of analysis that can vastly improve the data-driven annotation of genomes, more strongly link genotype to phenotype through inferred networks of interaction, and predict behaviors of cellular systems (Figure 1, lower-left quadrant). They have also led Figure 1. A Simplified Scheme for Organizing Results in the Field of Systems Biology to a wide array of conceptual interpreta- References are placed (subjectively) into this space according to whether their respective study focused tions about the organization and evolution more on mechanistic insight or on large-scale correlation analysis (the x axis) and whether the results were primarily principles about cellular networks or predictions of their behavior (the y axis). (Because of space of cellular networks into evolvable constraints, only the last name of the first author is given). modules, the decomposition of these networks into recurrent regulatory ‘‘motifs’’ with useful dynamical function, has also left a conceptual mark on studies, which are usually on the genomic and the robustness of these architectures systems biology). Many of these biological scale, infer relationships among genes to mutation (Figure 1, upper-left quadrant). technologies are scaling by a Moore’s and modules of function. These studies Correlative Approaches Law-type (Moore, 1965) dynamic in which can also annotate genes and their to Predicting Function every few years, the amount of DNA that products by a ‘‘guilt-by-association’’ One type of analysis infers properties of can be sequenced or synthesized doubles approach in which detailed biochemical biomolecules from correlated changes of in size for half the cost (as has the number information available about one gene or genome-scale RNA, protein, DNA copy of transistors on a microchip) (Carlson, system is transferred to others with corre- number, or metabolite abundance as it 2003). Clearly, this ability to read and write lated behaviors. This strategy contrasts varies in time and across conditions. genomic information has profoundly with a ‘‘casual’’ approach in which direct Most often, genes sharing common accelerated systems biology. interactions among molecules are expression dynamics are inferred to share tracked to glean mechanistic insights. regulators and possibly functional roles, Principles versus Prediction Interestingly, as genetic and biochemical as least at some level (Brown and Botstein, and Correlation versus Causation technologies climb the scaling curves, 1999). The challenge in this area has been This brief historical perspective suggests correlative and causal studies have isolating the set of correlated genes from that discoveries in systems biology may become more intermingled. In other the background of measurement noise be organized within a conceptual space words, as it becomes possible to rapidly and from those genes with merely coinci- (Figure 1). The y axis distinguishes alter any gene (Paddison et al., 2004), dent coexpression. Although clustering between two relatively distinct objectives: modulate any gene’s expression level, techniques have been used for decades deducing principles of network organiza- and perhaps even reorganize large to derive relationships in complex correlation necessary for behaviors versus regions of the genome (Gibson et al., tive data sets such as those found in gene reverse engineering networks to predict 2010; Wang et al., 2009; Warner et al., expression compendia, in 2000, Cheng their behavior. Strikingly, with the advent 2010), mechanistic studies will become and Church introduced an algorithm called of scaling biological data, two general available at a genome scale. ‘‘biclustering’’ that explicitly discovers approaches have evolved to meet these Obviously, prediction is not truly antip- ‘‘modules’’ from such data. This method objectives. On one hand, correlative odal to principles, nor is correlation identifies groups of genes, or ‘‘modules,’’

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 845 with similar patterns of expression over function of a single gene is now extended extracellular growth factor and cytokine a specific subset of conditions (Cheng to infer the underlying biochemical inputs (Janes et al., 2005, 2006). The re- and Church, 2000). Individual genes may network (Arkin et al., 1997). In 2001, sulting model successfully predicted the belong to multiple modules, thereby allow- Ideker et al. combined genetic, macromo- level of apoptosis as a function of cytokine ing inference of their numerous functions lecular interactions and expression data inputs and led to the new mechanistic and combinatorial regulation. This impor- (both protein and gene) to infer how the insight that cascades of autocrine sig- tant work inspired an increasing number galactose utilization network in yeast is naling were involved in mediating down- of algorithms concerned with identifying regulated (Ideker et al., 2001). They then stream cell responses to the extracellular related sets of biomolecules from complex used the resulting ‘‘influence network’’ to cues. data and inferring their ‘‘modular’’ func- predict how the system responds to Shortly thereafter, in another landmark tion. These algorithms thereby opened genetic perturbations. Some of these paper, Bonneau et al. (2007) demonstrated the door to discovering an apparent hierar- predictions were validated by experi- how the output of a new gene expression chical modular architecture to cellular ments, yet others were proven incorrect, biclustering algorithm provided input to regulation, which complements the more suggesting that properties of this well- a clever regression algorithm that deci- informal ‘‘pathway’’ organization with characterized regulatory network still phers the transcriptional regulatory which biologists were familiar. The await discovery. network of an Archaea (Halobacterium modules of coherent function also greatly Variants of this approach that applied salinarum NRC-1) and predicts expression simplify construction and interpretation additional, more sophisticated algorithms responses to > 100 conditions (Bonneau of predictive models, as they enabled from multivariate statistics and machine et al., 2007). Recently, such correlative prediction of how different modules, rather learning quickly began to have a strong systems analyses are scaling up to link than the individual constituent genes, are impact on the field. In particular, Harte- biomolecular networks to ecological dynamically deployed—a system formula- mink et al. (2001) offered perhaps the first networks. These pioneering studies are tion that has far fewer variables and thus Bayesian approach for rating different uncovering new scales of biological orga- requires far less data. network structural hypotheses (i.e., nization that should lead to entirely new Gene expression can be an indirect different patterns of molecular interaction) principles of ecosystem function (Zhou measurement of a component’s contribu- against data. Using a collection of 52 et al., 2010). tion to a particular cellular process, and conditions, they demonstrated that it Nevertheless, it is not yet clear how to thus, genetic perturbations and activity was possible to infer the regulatory inter- optimally design perturbation repertoires assays may be required. In seminal actions in the galactose pathway (Harte- to achieve maximum accuracy in anno- work, Giaever et al. (2002) constructed mink et al., 2001). Two years later, Segal tating gene function and regulation and a bar-coded deletion library for the entire et al. (2003) increased the power of these in predictive model inference with minimal genome of Saccharomyces cerevisiae. algorithms to infer the sets of genes expense. Also, it has yet to be proven that This library enabled single-pot assays of (i.e., modules) regulated by particular the models obtained in these types of the relative growth or fitness of each strain transcription factors under specific studies are sufficiently accurate or inex- when exposed to a specific condition conditions. This algorithm also correctly pensive to have an impact in a medical (Giaever et al., 2002). In a subsequent predicted new regulatory roles for less- or industrial setting. Nonetheless, the study, a growth phenotype for nearly characterized proteins (Segal et al., ability to collect such compendia of data, every gene in yeast was identified using 2003). In particular, the model predicted even from diverse types of experiments, 1000 chemical perturbations (Hillenmeyer that one putative transcription factor is rapidly becoming a feasible task for et al., 2008). These types of studies can (Ypl230w) and two signaling molecules even a single laboratory to accomplish. rapidly dissect the cellular targets of (Kin82 and Ppt1) were important for We predict that the increased accessibility drugs and even directly identify specific cellular response to three different condi- to these large-scale data sets will enable transporters involved. In addition, these tions: heat shock, hypo-osmotic shift, the detailed characterization of organisms studies have shown that genes displaying and entry into stationary phase, respec- after their genomes are sequenced and changes in expression under a given tively. Disrupting the genes elicited no may, ultimately, change what it means to condition are not always the genes neces- expression phenotype in rich, unstressed ‘‘complete’’ the genome of an organism. sary for responding functionally to that conditions but strong changes in expres- Uncovering Principles of Network condition (Giaever et al., 2002). Although sion relative to wild-type in the condition Organization the implications of this result are not fully predicted to be relevant for a given gene. The fact that clear functional modules of understood, one obvious conclusion is Applying a different statistical approach gene expression can be inferred from that different types of experiments are called ‘‘Partial Least Squares Regres- correlative data sets implies the existence required to deduce or even predict func- sion,’’ Janes and colleagues undertook of underlying organizational principles for tion of genes. herculean efforts to measure and correlate these networks. Similar hierarchies of Correlative Prediction mammalian cell survival, apoptosis, intra- modules have been found in large-scale of Organization cellular protein phosphorylation states, protein interaction data and metabolic Another type of analysis seeks to infer and kinase activities (thereby generating networks. Certain ‘‘scale-free’’ topologies relationships among gene modules; in a data set with 7980 intracellular measure- of molecular interaction networks have other words, the strategy used to infer ments) in response to combinations of received considerable attention in biology

846 Cell 144, March 18, 2011 ª2011 Elsevier Inc. and other fields. Such topologies, which the epistatic interactions between pairs of As a prime example, Tyson and seem to arise often in both natural and genes in these modules always fell into colleagues (Chen et al., 2004) modeled human designed systems, are character- one of two classes of interactions: buff- the cell-cycle control system of Saccharo- ized by a pattern of interconnectedness ering, in which epitasis diminishes the myces cerevisiae using a set of 35 among the nodes (e.g., proteins) in which individual phenotypic effects of the two ordinary differentiation equations (ODE) the number of interactions per node mutations, or aggravating, in which the representing molecular mechanisms and follows a power law. Influential papers deleterious, individual effects of two mass action (Chen et al., 2004) (for more have suggested that these topologies mutations are worsened by their combi- on modeling the cell cycle, see Primer lead to robustness to perturbation (Jeong nation (Segre` et al., 2005). Modules were by Ferrell et al. on page 874 of this issue). et al., 2000) and in the case of proteins, thus ‘‘monochromatic’’ and never con- The goal of the model was not to account naturally arise due the evolutionary tained mixed type genes, a principle that for the full complexity of the system but process of duplication and divergence was recently verified experimentally (Cos- instead to provide a reasonable approxi- (Rzhetsky and Gomez, 2001). Likewise, tanzo et al., 2010). mation of network behavior and to in developmental biology, it has been These architectural principles uncov- uncover dynamical principles of the archi- argued for decades that for integrated ered from large sets of correlative data tecture. Indeed, their model succeeded in cellular processes to evolve, they must are evocative and well supported, but accounting for a majority of mutant be dissociable into hierarchical, modular the challenge remains to find incontrovert- phenotypes simulated. units that can adapt their behavior with ible evidence for evolutionary selection of Using a similar framework, El-Samad little interference from other such units. these architectures and to fully charac- et al. (2005a, 2005b) modeled the heat Thus, interaction and expression modules terize their functional consequences. shock response in Escherichia coli. may allow rapid, effective rewiring and Mechanistic Approaches to Study Despite the simplicity of the response— tuning of internal dynamics (Price et al., Causal Relationships deploying chaperones to keep proteins 2007; Singh et al., 2008), such that this Although large-scale genomic data sets folded at higher temperature—this model ability to evolve may even be a selectable lend themselves to statistical analysis of uncovered complexity in the modular trait (Earl and Deem, 2004). However, correlation, causal analysis necessitates control structure of the system. It also caution must be taken in assigning evolu- more detailed biochemical data on the demonstrated how the many feedback tionary meaning to apparent modularity networks’ effectors, such as proteins, loops in this system confer the ability to (Lynch, 2007). second messengers, and metabolites. respond quickly and robustly while also On slightly smaller size scales, certain Unfortunately, the experimental analyses trying to minimize the energetic cost of topological motifs—that is, stereotypical of these components have not enjoyed heat shock protein expression (El-Samad small networks of regulatory interactions the same growth in scale as those of et al., 2005a, 2005b). In another important and chemical reactions—may have nucleic acids. That is, whereas volumes study, Yi et al. used dynamical systems important control functions for cellular of data on one-dimensional genomes are control theory to analyze bacterial networks (Rao and Arkin, 2001). The avail- readily available, causal analysis also chemotaxis (Yi et al., 2000), another ability of large-scale data has, in the last requires multidimensional data on biomol- system with well-characterized biochem- decade, enabled the discovery that ecules’ interactions, reactions and their istry. Building on the principle that nega- certain motifs appear more than expected rates, localization, and transport. Mass tive feedback is often central to biological by random chance (Shen-Orr et al., 2002), spectrometry, imaging, genetic sensors, stability (Rosenbleuth et al., 1943), the including feed-forward and feedback chemical probes, and other technologies study found that integral feedback control loops (for more on feed-forward loops, are increasingly providing such data, but underlies the robustness of network see Review by Yosef and Regev on page not yet at the same magnitude as genomic adaptation to significant perturbations in 886 of this issue). These motifs have information. As a result, causal analyses both the amounts and kinetic parameters potential functional importance, such as of cellular networks initially focused on of its component proteins. Interestingly, noise rejection, and appear physiologi- elucidating functional principles but are control engineers ‘‘reinvented’’ this cally robust but also evolutionarily flexible becoming increasingly empowered with strategy and proved that it is required, in with tunable function (Voigt et al., 2005). data to enable prediction. certain conditions, to build robustness Milo et al. (2002) hypothesized that these Uncovering Principles of Function into electrical circuits and other systems. motifs might form a sort of basis set of Large-scale models of biological net- Deterministic representations of dynamic functions from which complex works face the challenges that molecular networks are compromised when their optimized networks could be assembled mechanisms are often complex and constituents are present at low concen- in numerous contexts within and outside nonlinear (e.g., cooperative protein inter- trations or undergo slow reactions. More- of biology (Milo et al., 2002). actions and epigenetic regulation) and over, early studies suggested that noise A beautiful theoretical paper by Segre` many of their inherent parameters are can significantly influence network func- et al. (2005) determined another organiza- unknown (e.g., affinities and rate tion (Arkin et al., 1998). Elowitz et al. tional principle of cellular networks. They constants). However, in some model (2002) explored the principle that fluctua- not only showed that functional modules systems, the biochemistry is sufficiently tions in the quantities and reaction rates of could be inferred from growth phenotypes well characterized to enable the construc- gene expression machinery can cause of double knockout mutants, but also that tion of elegant, large-scale models. noise in gene expression at both a global

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 847 level in a cell (extrinsic), as well as for an whole-cell metabolic model for E. coli,in netic trees, thereby enhancing our under- individual gene (intrinsic) (Elowitz et al., which stoichiometric, thermodynamic, standing of mechanistic features that are 2002). Indeed, subsequent single-mole- and other constraints mathematically necessary for function and evolution. cule imaging studies directly confirmed yielded a solution space of allowed meta- The increasing integration of experimental that both translation (Yu et al., 2006) and bolic network states (Ibarra et al., 2002). and computational technologies will thus transcription (Raj et al., 2006) can underlie This model, which requires fewer parame- corroborate, deepen, and diversify the such noisy protein expression. ters than full dynamical models, can make theories that the earliest systems biolo- The principle that noise is inherent in predictions of network function that gists used logic to infer, thereby inching biological networks raised the question optimize growth under different environ- us ever closer to that central question: of whether its effects on biological fitness mental conditions. Indeed, when Ibarra ‘‘What is Life’’? are neutral, positive, or negative. Although et al. grew E. coli on a new carbon the value of noise depends on the system, substrate, the cells evolved to the meta- ACKNOWLEDGMENTS in certain cases, noise appears to make bolic state predicted by the model. positive contributions to fitness. Organ- In some systems, substantive compar- We would like to thank our colleagues for suggest- isms have a need to adapt to changing ison to data can yield deterministic ing a number of the papers that we reference in this environments, and two adaptation strate- models increasingly capable of predic- work, and we apologize for not being able to gies are sensing and responding to tion. Hoffmann and colleagues (2002) include all of them. The authors would like to acknowledge the National Institutes of Health change or stochastically switching analyzed the mammalian NF-kB system (R01 GM073010-01), and work conducted by phenotype. (Hoffmann et al., 2002), in which activa- ENIGMA was supported by the Office of Science, Two theoretical studies arrived at the tion of this transcription factor upregu- Office of Biological and Environmental Research principle that, under some conditions, lates expression of IkBa, a negative of the US Department of Energy under contract such as when transitions in selective envi- regulator of NF-kB. Integrating experi- number DE-AC02-05CH11231. ronments are slow or cannot be sensed, mental data with a deterministic model stochastic fluctuations in an organism’s enabled prediction of the oscillatory REFERENCES phenotype can increase its fitness (Kus- behavior of this module upon stimulation sell and Leibler, 2005; Wolf et al., 2005). and perturbation. Finally, Schoeberl Acar, M., Mettetal, J.T., and van Oudenaarden, A. In a study that combined experimental et al. (2002) developed a model with 94 (2008). Nat. Genet. 40, 471–475. approaches with simulations, Weinberger ODEs to simulate epidermal growth factor Arkin, A., Ross, J., and McAdams, H.H. (1998). et al. (2005) investigated this principle by signaling through MAP kinase, including Genetics 149, 1633–1648. analyzing stochastic effects in HIV infec- receptor trafficking dynamics and intra- Arkin, A.P., Shen, P.-D., and Ross, J. (1997). tion (Weinberger et al., 2005). Low initial cellular phosphorylation cascades Science 277, 1275. numbers of viral molecules, slow gene (Schoeberl et al., 2002). This is the first Bekey, G.A., and Beneken, J.E.W. (1978). Automa- expression, and amplification by a positive dynamic model of a large cellular tica 14, 41–47. feedback loop lead to very noisy gene signaling network that was carefully Bonneau, R., Facciotti, M.T., Reiss, D.J., Schmid, expression, which for some infections parameterized by prior experimental A.K., Pan, M., Kaur, A., Thorsson, V., Shannon, yielded long delays in gene expression. measurements and that yielded predic- P., Johnson, M.H., Bare, J.C., et al. (2007). Cell This delayed expression contributed to tion on signal transduction dynamics, 131, 1354–1365. the formation of latent HIV, which is clini- which were subsequently validated Brown, P.O., and Botstein, D. (1999). Nat. Genet. cally recognized as the most formidable experimentally. 21(1, Suppl), 33–37. barrier to the elimination of virus from Canon, W. (1939). The Wisdom of the Body (Lon- a patient. The Next Decade don: Norton). In an elegant study, Acar et al. (2008) As systems biology matures, the number Carlson, R. (2003). Biosecurity and Bioterrorism: engineered Saccharomyces cerevisiae of studies linking correlation with causa- Biodefense Strategy, Practice, and Science 1, strains that stochastically switched tion and principles with prediction 203–214. phenotypes at different rates. Interest- continues to grow (Figure 1). Advances Chen, K.C., Calzone, L., Csikasz-Nagy, A., Cross, ingly, they found that the fast-switching in measurement technologies that enable F.R., Novak, B., and Tyson, J.J. (2004). Mol. Biol. Cell 15, 3841–3862. strain outgrew the slow-switching strain large-scale experiments across an array in environments undergoing rapid fluctua- of parameters and conditions will increas- Cheng, Y., and Church, G.M. (2000). Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 93–103. tions, whereas the slow-switching strains ingly meld these correlative and causal were more fit in environments that fluctu- approaches, including correlative anal- Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., Ding, H., Koh, J.L., Tou- ated slowly (Acar et al., 2008). yses leading to mechanistic hypothesis fighi, K., Mostafavi, S., et al. (2010). Science 327, Predictive Analysis of Network testing as well as causal models empow- 425–431. and Cell Function ered with sufficient data to make predic- Earl, D.J., and Deem, M.W. (2004). Proc. Natl. The complexity of molecular mechanisms tions. In addition, the increasing number Acad. Sci. USA 101, 11531–11536. and scarcity of biochemical parameters of organisms sequenced and the El-Samad, H., Khammash, M., Homescu, C., and often makes the development of predic- increasing ease of measurement and Petzold, L. (2005a). Proceedings 16th IFAC World tive models challenging. Ibarra et al. genetic manipulation will enable deep Congress. http://engineering.ucsb.edu/cse/ (2002) created a constraints-based comparison of systems across phyloge- Files/IFACC_HS_OPT04.pdf.

848 Cell 144, March 18, 2011 ª2011 Elsevier Inc. El-Samad, H., Kurata, H., Doyle, J.C., Gross, C.A., Janes, K.A., Albeck, J.G., Gaudet, S., Sorger, P.K., Schoeberl, B., Eichler-Jonsson, C., Gilles, E.D., and Khammash, M. (2005b). Proc. Natl. Acad. Sci. Lauffenburger, D.A., and Yaffe, M.B. (2005). and Mu¨ ller, G. (2002). Nat. Biotechnol. 20, USA 102, 2736–2741. Science 310, 1646–1653. 370–375. Elowitz, M.B., Levine, A.J., Siggia, E.D., and Swain, Janes, K.A., Gaudet, S., Albeck, J.G., Nielsen, Segal, E., Shapira, M., Regev, A., Pe’er, D., Bot- P.S. (2002). Science 297, 1183–1186. U.B., Lauffenburger, D.A., and Sorger, P.K. stein, D., Koller, D., and Friedman, N. (2003). Nat. 124 Genet. 34, 166–176. Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, (2006). Cell , 1225–1239. L., Ve´ ronneau, S., Dow, S., Lucau-Danila, A., Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Segre` , D., Deluna, A., Church, G.M., and Kishony, 37 Anderson, K., Andre´ , B., et al. (2002). Nature 418, Baraba´ si, A.L. (2000). Nature 407, 651–654. R. (2005). Nat. Genet. , 77–83. 387–391. Shen-Orr, S.S., Milo, R., Mangan, S., and Alon, U. Kacser, H., and Burns, J.A. (1973). Symp. Soc. (2002). Nat. Genet. 31, 64–68. Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, Exp. Biol. 27, 65–104. V.N., Chuang, R.-Y., Algire, M.A., Benders, G.A., Singh, A.H., Wolf, D.M., Wang, P., and Arkin, A.P. Kauffman, S.A. (1969). J. Theor. Biol. 22, 437–467. Montague, M.G., Ma, L., Moodie, M.M., et al. (2008). Proc. Natl. Acad. Sci. USA 105, 7500–7505. 329 309 (2010). Science , 52–56. Kussell, E., and Leibler, S. (2005). Science , Voigt, C.A., Wolf, D.M., and Arkin, A.P. (2005). Goodwin, B.C. (1963). (London: Academic Press). 2075–2078. Genetics 169, 1187–1202. 104 Hartemink, A.J., Gifford, D.K., Jaakkola, T.S., and Lynch, M. (2007). Proc. Natl. Acad. Sci. USA Waddington, C.H. (1954). Proceedings of the 9th Suppl 1 Young, R.A. (2001). In Paciﬁc Symposium on Bio- ( ), 8597–8604. International Congress of Genetics 9, 232–245. computing 2001 (PSB01), R. Altman, A.K. Dunker, Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Waddington, C.H. (1977). Tools for thought L. Hunter, K. Lauderdale, and T. Klein, eds. (New Chklovskii, D., and Alon, U. (2002). Science 298, (New York: Basic Books). Jersey:: World Scientiﬁc), pp. 422–433. 824–827. Wang, H.H., Isaacs, F.J., Carr, P.A., Sun, Z.Z., Xu, Heinrich, R., and Rapoport, T.A. (1974). Eur. J. Bio- Moore, G.E. (1965). Electronics 38, 114–117. G., Forest, C.R., and Church, G.M. (2009). Nature chem. 42, 89–95. 460, 894–898. Paddison, P.J., Silva, J.M., Conklin, D.S., Schla- Hillenmeyer, M.E., Fung, E., Wildenhain, J., Pierce, bach, M., Li, M., Aruleba, S., Balija, V., O’Shaugh- Warner, J.R., Reeder, P.J., Karimpour-Fard, A., S.E., Hoon, S., Lee, W., Proctor, M., St Onge, R.P., nessy, A., Gnoj, L., Scobie, K., et al. (2004). Nature Woodruff, L.B., and Gill, R.T. (2010). Nat. Biotech- 320 28 Tyers, M., Koller, D., et al. (2008). Science , 428, 427–431. nol. , 856–862. 362–365. Weinberger, L.S., Burnett, J.C., Toettcher, J.E., Price, M.N., Dehal, P.S., and Arkin, A.P. (2007). Hoffmann, A., Levchenko, A., Scott, M.L., and Bal- Arkin, A.P., and Schaffer, D.V. (2005). Cell 122, PLoS Comput. Biol. 3, 1739–1750. timore, D. (2002). Science 298, 1241–1245. 169–182. Raj, A., Peskin, C.S., Tranchina, D., Vargas, D.Y., Ibarra, R.U., Edwards, J.S., and Palsson, B.O. Wolf, D.M., Vazirani, V.V., and Arkin, A.P. (2005). J. and Tyagi, S. (2006). PLoS Biol. 4, e309. (2002). Nature 420, 186–189. Theor. Biol. 234, 227–253. Rao, C.V., and Arkin, A.P. (2001). Annu. Rev. Bio- Ideker, T., Thorsson, V., Ranish, J.A., Christmas, Yi, T.-M., Huang, Y., Simon, M.I., and Doyle, J. med. Eng. 3, 391–419. R., Buhler, J., Eng, J.K., Bumgarner, R., Goodlett, (2000). Proc. Natl. Acad. Sci. USA 97, 4649–4653. D.R., Aebersold, R., and Hood, L. (2001). Science Rosenbleuth, A., Wiener, N., and Bigelow, J. Yu, J., Xiao, J., Ren, X., Lao, K., and Xie, X.S. 292, 929–934. (1943). Philos. Sci. 10, 18–43. (2006). Science 311, 1600–1603. Jacob, F., and Monod, J. (1962). Cold Spring Harb. Rzhetsky, A., and Gomez, S.M. (2001). Bioinfor- Zhou, J., Deng, Y., Luo, F., He, Z., Tu, Q., and Zhi, Symp. Quant. Biol. 26, 193–211. matics 17, 988–996. X. (2010). MBio. 1, e00169-10.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 849 Leading Edge Essay

The Cell in an Era of Systems Biology

Paul Nurse1,2,* and Jacqueline Hayles1 1Cancer Research UK, London Research Institute, 44, Lincoln’s Inn Fields, London UK WC2A 3LY, UK 2The Rockefeller University, 1230 York Avenue, New York, NY 10021-6399, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.02.045

The increasing use of high-throughput technologies and computational modeling is revealing new levels of biological function and organization. How are these features of systems biology inﬂu- encing our view of the cell?

It is difficult to forecast the impact of networks (Langston et al., 2010), and like- increasingly those interested in systems systems biology on our understanding of wise some insights in cell biology may not approaches in cell biology are considering the cell, an issue not made any easier by arise from strictly molecular explanations. ecological and evolutionary perspectives the fact that there is as yet no firm consen- (Ezov et al., 2006; Liti et al., 2009). Ecology sus as to what is meant by ‘‘systems Biological Function is relevant to the relationships of a cell with biology,’’ although as our colleague Marc and Organization other cells and with its physical environ- Kirschner has said, ‘‘we all seem to know One approach to systems biology has ment and applies to both free-living it when we see it.’’ And in that spirit we been to emphasize the overall biological single-celled organisms such as the will discuss here the various attributes functions expressed at different levels of yeasts and Protozoa and to cells within and methodological approaches usually biological organization, such as the tissues. Ecological and evolutionary associated with systems biology, how organelle, the cell, the tissue, the organ, perspectives can help to understand they have been applied to cell biology, and the organism. The level of the cell how a cell has come to function as it and how they may be developed to attain occupies a particularly important position does and to improve awareness of the a better understanding of how cells work. (Brenner, 2010; Nurse, 2008) as it is the selective pressures operating on a cell simplest unit exhibiting the characteristics (Ding et al., 2010; Shah et al., 2009). Reductionism and Holism of life, so understanding biological func- Improved contacts between the ecolog- Discussions of systems biology often tion at the level of the cell brings us closer ical and evolutionary communities and make a distinction between holistic and to a better appreciation of the nature of cell biologists will enhance these studies. reductionist approaches. Our view is that life. The differing levels or units of organi- scientific explanations and methodolo- zation from organelle to organism often Ensemble Descriptions gies are essentially reductionist in nature. exhibit teleonomic, that is, apparently An approach often associated with However, although it is difficult to imagine purposeful behaviors (Monod, 1972). systems biology is the generation of a scientific enquiry or explanation that is Examples of purposeful behavior include ensemble descriptions, that is, the collec- not reductionist, it is important to keep homeostasis and the maintenance of tion of data describing the behavior of a focus on the behavior of whole systems organizational integrity, the generation of large numbers of components. This has in biology and to understand how the spatial and temporal order, communica- been made possible by increasingly interactions and processes brought about tion within and between the units of orga- sophisticated technologies and analytical by component parts acting at lower levels nization, and the reproduction of those procedures, which have led to massively in a system are constrained by overall units. The objective of this approach is parallel collections of different types of functions acting at higher levels. to understand how teleonomic behaviors data and the establishment of consortia Sometimes those of a more holistic are generated at the different units of such as ENCODE (http://encodeproject. persuasion object to the dominance of organization, usually in terms of mole- org/) and databases such as the Saccha- molecular explanations in cell biology, cules and of interactions between mole- romyces Genome Database (SGD) and but the fact is that most useful explana- cules. This view of systems biology the fission yeast database (Pombase). tions in cell biology have to be in terms of stresses overall biological function of the The canonical ensemble approach has molecules because molecules are the relevant biological unit and is an approach been whole-genome sequencing, which most relevant lower-level component encompassed by a number of traditional has allowed the description and compar- into which to decompose the function biological disciplines including physiology ison of gene contents for a wide range of and organization of the cell. However, and forward genetics. organisms, facilitating molecular genetic not all explanations in biology are An interest in the overall biological func- analysis of biological mechanisms far molecular, for example developmental tions of a living organism naturally leads to beyond the limited numbers of genetically processes may be explained in terms of consideration of the influence of ecology amenable model organisms. cell behavior (Towers and Tickle, 2009) and evolution on how that organism Genome sequencing is particularly and neurobiology by the action of neural works. This also applies to cells, and useful for cell biology because all living

850 Cell 144, March 18, 2011 ª2011 Elsevier Inc. organisms are composed of cells, and so universally in a particular cellular phenom- may therefore have interesting regulatory orthologous genes important for cellular enon. For example, a comparative roles. phenomena can be studied in a variety approach has enabled the identification For many researchers, the creation of of organisms. Cells in different organisms of RNAs whose levels change at transi- interaction networks is a major goal of or in different tissues of the same tions through specific cell-cycle stages in systems cell biology that is aimed at organism allow orthologous genes and a conserved manner in more than one providing complete networks of different related cellular phenomena to be investi- organism (Rustici et al., 2004). cellular processes. However, achieving gated in a range of situations yielding this aim may require more sophisticated informative comparisons. A good ex- Networks languages or notations to fully describe ample has been the comparison of cell- The availability of ensemble datasets also how the networks work. Unlike simple cycle control in yeast cells with metazoan allows the systematic grouping of genes networks, such as an airline transporta- embryos (Gould and Nurse, 1989; Murray with related functions. For example, tion network, the interaction linkages in and Kirschner, 1989). Knowledge of catalogs of genes that when deleted biological networks may represent stable whole-genome sequences also allows have a similar cellular phenotype will iden- complexes or transient catalytic reactions gene ablation experiments to be carried tify gene sets required for particular or may reflect the logical nature of the out on a genome-wide basis. Two major processes. Similarly, RNA transcripts interaction, for example a representation methodologies have been used, system- that behave in a similar manner, such as of a negative or positive feedback. The atic gene deletions and libraries of small- peaking in level at a particular phase of notation used in network descriptions interfering RNAs (siRNAs). Other method- the cell cycle, reveal RNAs that potentially needs to reflect this complexity. It is also ologies such as transposition have been have related roles. In this way, the ‘‘tool- important to take account of the fact that used, particularly in prokaryotes (Zhang kit’’ required for a specific cellular process the linkages are not always hard-wired and Lin, 2009). To date, whole-genome can be assembled. Another grouping because they are mostly based on chem- gene deletions have only been completed approach is to construct networks based istry with connections established by in bacteria and yeasts (de Berardinis et al., on gene products that interact with each chemicals diffusing from one component 2008; Giaever et al., 2002; Kim et al., 2010) other. Such networks can be assembled to another. These chemical linkages can and have the advantage of completely using interaction trap methodologies readily break and reform to connect ablating a gene function, making func- (such as two-hybid methodologies different components and remodel the tional assignments and comparisons of and immunoprecipitations) that assess architecture of the network (Bray, 2009). gene functions between organisms more whether molecules are in physical straightforward. siRNA libraries are very contact. Also important are catalytic inter- Quantitative Methodologies versatile as they can be employed in actions resulting in metabolic changes or Quantitative methodologies involving many organisms but can be subject to chemical modifications, such as phos- both large datasets and the modeling of partial knockdown and off-target effects phorylation. Biochemical approaches data are frequently used in systems (Sioud, 2011). Successes using these can be complemented by high- approaches. The massively parallel approaches include the identification of throughput genetic interaction assays collections of data as generated by micro- all genes required for the viability of (screening, for example, for synthetic arrays, for example, have superseded the budding and fission yeast cells, for cellular lethality), although these do not neces- more qualitative measurements of tradi- processes such as centromeric cohesion sarily provide evidence for direct physical tional molecular biology with techniques in budding yeast (Marston et al., 2004), interaction between components. Green such as northern and western blotting. and for mitosis in human cells (Kittler fluorescent protein tags can be used An advantage of good quantification is et al., 2007; Neumann et al., 2010). to identify molecular components that that it leads to a better appreciation of Ensemble descriptions have been used spatially colocalize, as an indicator of the effect of the number of molecules extensively, including microarrays for potential functional relationships (Huh within a cell on biological processes. monitoring the types and levels of RNAs, et al., 2003; Matsuyama et al., 2006). This allows an assessment of the stoi- mass spectroscopy for studying proteins, These various methodologies allow chiometry between different molecular and mass spectroscopy and chromato- networks to be built up that connect components as well as recognition that graphy for assessing metabolites. En- molecular components throughout the there may be only a few molecules of semble data collections have the ad- cell to generate an overall cellular interac- a particular type present within a cell. vantage of avoiding the dangers of tome (Collins et al., 2007; Rual et al., Some gene transcripts in yeast are inadvertently ‘‘cherry-picking’’ data when 2005). The power of these networks is present at an average of less than one studies are confined to work on limited enhanced when they are combined with per cell (Velculescu et al., 1997), shifting numbers of gene products, which can catalogs of genes involved in a particular our view of regulation from being driven result in investing too much importance cellular function because they lead to by mass action, which is analog in char- to a particular RNA or protein simply a better molecular understanding of the acter, to one that is more stochastic and because it is the only one under investiga- process of interest. Interaction networks digital. This means that greater attention tion. Comparisons between different cells can also identify linker components that is needed on the influence of molecular and organisms allow the identification of connect different functional networks noise on the cell (Newman et al., 2006). gene products that are implicated more and processes (Zhong et al., 2009) and An important issue is whether noise and

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 851 the variability it generates between cells tional pathways, and these additions thought to have been important in CDK are exploited for regulatory purposes, increase redundancy. In this respect regulation and mitotic control can be elim- for example to ensure a range of cellular modeling in biology may differ from inated while still maintaining good size responses to environmental changes physics where the aesthetic is to search control over mitotic onset. This simplified such as the competence state of Bacillus for the simplest and most elegant model network focuses attention on those subtilis (Maamar et al., 2007; Suel et al., to explain a phenomenon. In biology, there elements that are sufficient to generate 2006). Monitoring of single cell behaviors are often more elements in a model than good mitotic control and cell size homeo- has revealed that there is a significant are strictly necessary and some act redun- stasis, reducing the degrees of freedom variation between cells that was not dantly. The number of elements also and making modeling more straightfor- appreciated previously by global popula- increases the degrees of freedom avail- ward. In a way, this is synthetic biology tion analyses (Choi and Kim, 2009). The able, reducing confidence in the outcome in reverse; rather than building a simple combined use of photomicrography and of the modeling process. network de novo, such as an oscillator robotic microscopes is capable of gener- There are several ways these difficulties or clock (Danino et al., 2010; Elowitz and ating large amounts of data to investigate can be addressed. One way used by Leibler, 2000), a pre-existing network these effects of noise and stochastic modelers is to test the sensitivity of is simplified. Both approaches lead to behaviors in cells, for example cell size models to make sure that they still work the same outcome—the generation of variability at the G1-S transition in well when the parameters used in the simpler models still capable of explaining budding yeast (Di Talia et al., 2007). equations are varied. If the model still the biological function of interest. Biochemical processes are generally behaves robustly when different values modeled by deriving differential equations are used in the equations, then confi- Managing Information to calculate flux through pathways using dence in the model is increased. It is Networks and quantitative modeling are in vivo estimates of the rate constants also helpful if the biological function being closely associated with information and concentrations of components. studied can be recapitulated in vitro. management within the cell. Many of the Although modeling in cell biology has Many quite complex processes can be most insightful explanations in cell biology become more popular in recent years, in carried out in concentrated Xenopus egg have been made in terms of information part due to the massive increase in data and cell extracts, for example important flow; this involves understanding how available and to the migration of more aspects of cell-cycle control (Blow and the cell gathers, processes, stores, and theoretically inclined scientists to biology, Laskey, 1986; Deibler and Kirschner, uses information in the context of a biolog- in the past it was only pursued by a few 2010). In Xenopus extracts, for instance, ical function or phenomenon of interest committed individuals (Novak and Tyson, the levels of biochemical components (Nurse, 2008). Information is gathered 1997; Tyson, 1983). The evolutionary bio- can be both measured and manipulated from both outside and inside the cell and logist John Maynard Smith contended more easily than is possible in a living cell. is processed and communicated to that the act of thinking about a model’s Fluorochrome-based sensor modules different parts of the cell. Storage of infor- equations greatly clarifies understanding combined with light microscopy are also mation occurs over a wide range of time- of how the model works. Biologists have providing better ways of measuring scales, from the long timescale seen in a tendency to produce somewhat loosely concentrations within cells in vivo, such heredity (encoded in the DNA sequence formulated models summarized in the as protein levels in budding yeast cells and possibly mediated by epigenetics), form of cartoons, and it is useful to subject (Newman et al., 2006). Another approach through to the medium timescale seen these to the discipline of writing equations is to simplify the biochemical network with mRNA and gene transcriptional in the expectation that the thought underlying the biological function of circuits, to the short timescale seen with imposed by equation writing will improve interest, although this is only useful if the activated small G proteins (Bonasio understanding of the model’s assump- essential elements of that process are still et al., 2010; Etienne-Manneville and Hall, tions and dynamics. maintained. The advantage of simplifica- 2002; Roy et al., 2010). Information is However, two major problems are often tion is that it reduces the degrees of used to direct cell behaviors, coordinating encountered when generating mathemat- freedom available, making modeling appropriate responses to changing ical models for cell biology: the complexity easier and the outcome more reliable. circumstances. Recognition of the signifi- of the pathways being modeled and the An example of the potential for network cance of information was crucial at the difficulty of estimating the appropriate simplification is seen with a recent genetic beginning of molecular biology, particu- values for rate constants and the concen- manipulation of the mitotic control larly in dealing with how information tration of components. Biochemical path- network in fission yeast (Coudreuse and flowed from gene to protein, although it ways are often complex with many redun- Nurse, 2010). Many gene products have applies to all aspects of cell behavior. dant functions, reflecting the fact that been identified that regulate the cyclin- The iconic examples from that time are evolution does not always lead to, from dependant kinases (CDKs), and several the concepts that DNA acts as a digital an engineer’s point of view, the most effi- quantitative models have been generated information storage device (Brenner cient and economic solutions (Jacob, that can explain how CDKs are controlled et al., 1961) and that the lac operon regu- 1977; Saunders and Ho, 1976). Natural to ensure orderly progression through the latory circuit forms a negative feedback selection acts on pre-existing cells often cell cycle at the correct cell size. Unex- loop (Dickson et al., 1975; Lin and Riggs, by making additions to previously opera- pectedly, a number of the gene products 1975; Ohki and Sato, 1975). Systems

852 Cell 144, March 18, 2011 ª2011 Elsevier Inc. biology, by generating datasets, net- a series of dots and dashes. Information functions and to group them into the works, and models, provides an opportu- is also managed in the three dimensions networks responsible for the process. nity to understand information flow of cellular space (Scott and Pawson, Genetics can be used to simplify the through the cell. In our view, this is one 2009). Not only must spatial information network, focusing attention on the core of the most important aspects of systems be generated to define the space of the gene functions responsible for the process analyses in cell biology and will help move cell but the availability of various cellular to help with subsequent modeling. studies from descriptions of biological compartments means that different Comparisons with cells in other organisms phenomena to a better understanding of information can be stored in different will test whether the conclusions being how they work. places and a wide variety of connections reached can be generalized across Information management involves between logic modules can be formed species including human cells and also various processing elements or logic and reformed through diffusible chemi- allow in vitro systems to be developed modules that carry out particular compu- cals. The richness of behavior possible especially with Xenopus egg extracts. tational functions, which can be catego- with this arrangement is reminiscent of A major aim with the initiative will be to rized according to the type of function the complex behaviors normally associ- explain as often as possible a cell biolog- they carry out. For example, a negative ated with neural networks. ical function or process in terms of feedback loop communicates information information management. This requires from a late step in a pathway to an earlier A Cell Biology Systems Initiative interdisciplinary approaches and is not step, and if there is increased flow at the The cell is the simplest unit that exhibits so straightforward because cell biology later step, then a negative signal is sent the characteristics of life and so is likely experiments generally yield biochemical to the earlier one, reducing overall flow to be the most effective level in biology results, and there are no easy ways to through the pathway and thus maintaining to investigate how life works. The tools translate chemistry into the information homeostasis. In contrast, a positive feed- and intellectual framework of systems processing elements or logic modules back loop sends a positive signal that biology will provide great opportunities that we have argued are needed for increases overall flow to generate a switch to achieve this objective by generating good understanding. It would be helpful to maximum flow through the pathway. the data needed and the approaches if there were more effective ways to model More complex logic modules produce required for a comprehensive under- pathways and networks without having to more sophisticated responses, such as standing of the cell. This applies to all know all the rate constants and concen- toggles switching between two states, types of cells including bacteria, which trations involved, and we have previously timers measuring elapsed time, oscilla- can have small genomes and where there outlined possible procedures that may tors cycling in time, and gradients have been great advances in recent years help with that elsewhere (Nurse, 2008). measuring cellular dimensions (Tyson (Wang et al., 2010). But it is with eukary- Despite these difficulties, we are now et al., 2003). The operation of these otic cells where the greatest benefits are well placed to apply the methods of modules depends on how the various likely to be realized because already systems biology more comprehensively components are linked together and the much work has been achieved and the to cell biology to gain greater insight into shapes of the response curves that deter- conservation of many processes across how cells work. mine the character of those interactions. eukaryotes means that different cell types There is a need to build on past work to with differing characteristics and ACKNOWLEDGMENTS construct a full listing of the different types strengths in methodologies can be used of logic modules that are operational in to study the same biological phenomena. We would like to thank our colleagues at The Rock- cells. Working with engineers and cyber- It’s perhaps not surprising that we, as efeller University and CRUK London Research neticists should be helpful in achieving two yeast geneticists, would recommend Institute, particularly L. Weston and J. Wu, for helpful comments on the manuscript. We would also this goal (Alon, 2003; Nurse, 2008). the unicellular budding and fission yeasts like to acknowledge the many researchers whose An emphasis on information manage- as good models for studying many work we have not been able to reference because ment may reveal some unexpected aspects of cell biology using a systems of space constraints. features of cells. An example is the poten- approach. Both organisms are eukary- tial for dynamics to enrich information otes with small genomes of only 5000– REFERENCES transfer through signaling pathways. 6000 genes, making systems genomic Such pathways are usually thought of as analyses more straightforward to carry Alon, U. (2003). Science 301, 1866–1867. on/off switches that can only be in one out. The availability of genome-wide Blow, J.J., and Laskey, R.A. (1986). Cell 47, of two states. However, if signals are gene deletion collections together with 577–587. pulsed down the pathway and the output other methods for saturation forward Bonasio, R., Tu, S., and Reinberg, D. (2010). depends on the dynamics of those genetics (Guo and Levin, 2010) allows Science 330, 612–616. pulses, then more information can be the identification of nearly all the genes Bray, D. (2009). Wetware: A Computer in Every communicated (von Kriegsheim et al., in the genome that are involved in a partic- Living Cell (New Haven, CT: Yale University Press). 2009). This is the same idea that forms ular cellular function or process. Applica- Brenner, S. (2010). Philos. Trans. R. Soc. Lond. B the basis for Morse code, a system that tion of interaction trap procedures Biol. Sci. 365, 207–212. communicates complex messages by together with bioinformatics will help to Brenner, S., Jacob, F., and Meselson, M. (1961). utilizing the dynamics produced by identify the biochemical roles of gene Nature 190, 576–581.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 853 Choi, J.K., and Kim, Y.J. (2009). Nat. Genet. 41, Kim, D.U., Hayles, J., Kim, D., Wood, V., Park, Rual, J.F., Venkatesan, K., Hao, T., Hirozane-Kish- 498–503. H.O., Won, M., Yoo, H.S., Duhig, T., Nam, M., ikawa, T., Dricot, A., Li, N., Berriz, G.F., Gibbons, Collins, S.R., Miller, K.M., Maas, N.L., Roguev, A., Palmer, G., et al. (2010). Nat. Biotechnol. 28, F.D., Dreze, M., Ayivi-Guedehoussou, N., et al. Fillingham, J., Chu, C.S., Schuldiner, M., Gebbia, 617–623. (2005). Nature 437, 1173–1178. M., Recht, J., Shales, M., et al. (2007). Nature Kittler, R., Pelletier, L., Heninger, A.K., Slabicki, M., Rustici, G., Mata, J., Kivinen, K., Lio, P., Penkett, 446 , 806–810. Theis, M., Miroslaw, L., Poser, I., Lawo, S., Grab- C.J., Burns, G., Hayles, J., Brazma, A., Nurse, P., 9 Coudreuse, D., and Nurse, P. (2010). Nature 468, ner, H., Kozak, K., et al. (2007). Nat. Cell Biol. , and Bahler, J. (2004). Nat. Genet. 36, 809–817. 1074–1079. 1401–1412. Saunders, P.T., and Ho, M.W. (1976). J. Theor. Langston, R.F., Ainge, J.A., Couey, J.J., Canto, Danino, T., Mondragon-Palomino, O., Tsimring, L., Biol. 63, 375–384. and Hasty, J. (2010). Nature 463, 326–330. C.B., Bjerknes, T.L., Witter, M.P., Moser, E.I., and Moser, M.B. (2010). Science 328, 1576–1580. Scott, J.D., and Pawson, T. (2009). Science 326, de Berardinis, V., Vallenet, D., Castelli, V., Besnard, 1220–1224. M., Pinet, A., Cruaud, C., Samair, S., Lechaplais, Lin, S., and Riggs, A.D. (1975). Cell 4, 107–111. C., Gyapay, G., Richez, C., et al. (2008). Mol. Liti, G., Carter, D.M., Moses, A.M., Warringer, J., Shah, S.P., Morin, R.D., Khattra, J., Prentice, L., Syst. Biol. 4, 174. Parts, L., James, S.A., Davey, R.P., Roberts, I.N., Pugh, T., Burleigh, A., Delaney, A., Gelmon, K., 461 Deibler, R.W., and Kirschner, M.W. (2010). Mol. Burt, A., Koufopanou, V., et al. (2009). Nature Guliany, R., Senz, J., et al. (2009). Nature , Cell 37, 753–767. 458, 337–341. 809–813. Di Talia, S., Skotheim, J.M., Bean, J.M., Siggia, Maamar, H., Raj, A., and Dubnau, D. (2007). Sioud, M. (2011). Methods Mol. Biol. 703, 173–187. E.D., and Cross, F.R. (2007). Nature 448, 947–951. Science 317, 526–529. Suel, G.M., Garcia-Ojalvo, J., Liberman, L.M., and Marston, A.L., Tham, W.H., Shah, H., and Amon, A. Dickson, R.C., Abelson, J., Barnes, W.M., and Elowitz, M.B. (2006). Nature 440, 545–550. Reznikoff, W.S. (1975). Science 187, 27–35. (2004). Science 303, 1367–1370. Towers, M., and Tickle, C. (2009). Int. J. Dev. Biol. Ding, L., Ellis, M.J., Li, S., Larson, D.E., Chen, K., Matsuyama, A., Arai, R., Yashiroda, Y., Shirai, A., 53, 805–812. Wallis, J.W., Harris, C.C., McLellan, M.D., Fulton, Kamata, A., Sekido, S., Kobayashi, Y., Hashimoto, R.S., Fulton, L.L., et al. (2010). Nature 464, A., Hamamoto, M., Hiraoka, Y., et al. (2006). Nat. Tyson, J.J. (1983). J. Theor. Biol. 104, 617–631. 999–1005. Biotechnol. 24, 841–847. Tyson, J.J., Chen, K.C., and Novak, B. (2003). Curr. Elowitz, M.B., and Leibler, S. (2000). Nature 403, Monod, J. (1972). Chance and Necessity: An Essay Opin. Cell Biol. 15, 221–231. 335–338. on the Natural Philosophy of Modern Biology (New Velculescu, V.E., Zhang, L., Zhou, W., Vogelstein, Etienne-Manneville, S., and Hall, A. (2002). Nature York: Vintage Books). J., Basrai, M.A., Bassett, D.E., Jr., Hieter, P., 420, 629–635. Murray, A.W., and Kirschner, M.W. (1989). Nature Vogelstein, B., and Kinzler, K.W. (1997). Cell 88, 339, 275–280. Ezov, T.K., Boger-Nadjar, E., Frenkel, Z., Katsper- 243–251. ovski, I., Kemeny, S., Nevo, E., Korol, A., and Neumann, B., Walter, T., Heriche, J.K., Bulkescher, Kashi, Y. (2006). Genetics 174, 1455–1468. J., Erﬂe, H., Conrad, C., Rogers, P., Poser, I., Held, von Kriegsheim, A., Baiocchi, D., Birtwistle, M., 464 Sumpton, D., Bienvenut, W., Morrice, N., Yamada, Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, M., Liebel, U., et al. (2010). Nature , 721–727. K., Lamond, A., Kalna, G., Orton, R., et al. (2009). L., Veronneau, S., Dow, S., Lucau-Danila, A., Newman, J.R., Ghaemmaghami, S., Ihmels, J., Nat. Cell Biol. 11, 1458–1464. Anderson, K., Andre, B., et al. (2002). Nature 418, Breslow, D.K., Noble, M., DeRisi, J.L., and Weiss- 387–391. man, J.S. (2006). Nature 441, 840–846. Wang, Y., Cui, T., Zhang, C., Yang, M., Huang, Y., Gould, K.L., and Nurse, P. (1989). Nature 342, Novak, B., and Tyson, J.J. (1997). Proc. Natl. Acad. Li, W., Zhang, L., Gao, C., He, Y., Li, Y., et al. 9 39–45. Sci. USA 94, 9147–9152. (2010). J. Proteome Res. , 6665–6677. Guo, Y., and Levin, H.L. (2010). Genome Res. 20, Nurse, P. (2008). Nature 454, 424–426. Zhang, R., and Lin, Y. (2009). Nucleic Acids Res. 37 239–248. Ohki, M., and Sato, S. (1975). Nature 253, 654–656. , D455–D458. Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Roy, S., Ernst, J., Kharchenko, P.V., Kheradpour, Zhong, Q., Simonis, N., Li, Q.R., Charloteaux, B., Howson, R.W., Weissman, J.S., and O’Shea, E.K. P., Negre, N., Eaton, M.L., Landolin, J.M., Bristow, Heuze, F., Klitgord, N., Tam, S., Yu, H., Venkate- (2003). Nature 425, 686–691. C.A., Ma, L., Lin, M.F., et al. (2010). Science 330, san, K., Mou, D., et al. (2009). Mol. Syst. Biol. Jacob, F. (1977). Science 196, 1161–1166. 1787–1797. 5, 321.

854 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Leading Edge Essay

Informing Biological Design by Integration of Systems and Synthetic Biology

Christina D. Smolke1,* and Pamela A. Silver2,* 1Department of Bioengineering, Stanford University, 473 Via Ortega, Stanford, CA 94305-4201, USA 2Department of Systems Biology and Wyss Institute of Biologically Inspired Engineering, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA *Correspondence: [email protected] (C.D.S.), [email protected] (P.A.S.) DOI 10.1016/j.cell.2011.02.020

Synthetic biology aims to make the engineering of biology faster and more predictable. In contrast, systems biology focuses on the interaction of myriad components and how these give rise to the dynamic and complex behavior of biological systems. Here, we examine the synergies between these two ﬁelds.

Biology is the technology of this century. apply this approach. Instead, imagine organism. However, we now know that The potential uses of biology to improve a time when a bioengineer designs a sys- genes display more fine-grained modu- the human condition and the future tem at the computer, orders the neces- larity in the form of promoters, open of the planet are myriad. Over the last sary DNA encoding the specified system, reading frames (ORFs), and regulatory century, humans have used biology to and then begins the actual experiment of elements. mRNAs contain sequences make many useful things, in part based turning it into life. Thus, one overarching important for proper intracellular targeting on discoveries from molecular biology. goal of synthetic biology is to make the and degradation. Proteins often contain In addition, researchers have redesigned engineering of biology faster, affordable, targeting sequences, reactive centers, biological systems to test our funda- and more predictable. and degradation sequences. And lastly, mental understanding of their compo- Biological systems and their underlying entire pathways are modular in that nents and integrated functions. However, components offer a number of functional some signaling pathways can be trans- the complexity and reliability of engi- parallels with engineered systems. For ferred from one organism to another to neered biological systems still cannot example, biological sensors are exqui- reconstruct a new state in the engineered approach the diversity and richness ex- sitely sensitive; the olfactory system can organism. This modularity underlies one hibited by their natural counterparts. It is detect single odorant molecules and of the core concepts of synthetic bio- then the combined promise of systems decode them. Biological systems can logy—the notion that one can assemble biology and synthetic biology that may send and receive signals rapidly and in a biological systems from well-defined drive transformative advances in our highly specific manner. Pathways exist ‘‘parts’’ or modules (Endy, 2005). How- ability to program biological function. to sense and respond to the environment. ever, modular assembly approaches One recent example of the successful Plants and microbes can use sunlight as have largely remained confounded by engineering of a biological system to an energy source. However, biological the effects of context—that is, the non- address a global challenge in health and systems are also uniquely capable of modular aspects of biology. For example, medicine is the creation of microbes that self-replication, mutation, and selection, where a gene or an associated regulatory produce a precursor to the antimalarial leading to evolution. Synthetic biologists element is located in the genome can drug artemisinin (Ro et al., 2006). By shift- aim to take advantage of these parallels impact expression and thus its function. ing synthesis from the natural production and develop engineering principles for In addition, the location of regulatory host (a plant) to one more optimized for the design and construction of biological elements relative to each other and rapid production times and inexpensive systems. However, an open question is ORFs can impact their encoded function scale up (a microorganism), researchers whether we understand biological sys- (Haynes and Silver, 2009). Further anal- were able to develop a process that en- tems sufficiently to be able to redesign yses provided by systems biology may abled cheaper supply of this drug, pro- them to fulfill specific requirements. help to guide the development of stan- viding a more accessible cure for a Engineers enjoy the concept of inter- dard strategies for assembling genetic disease devastating third world countries. changeable parts and modularity. Biology modules into functional units. However, the research phase of this offers many sources of potential modu- project required an investment of over larity but exhibits nonmodular features Approaches to Synthetic Biology $25 million and 150 person-years of highly as well. For many years the gene was Given that the goals of synthetic biology trained researcher effort. This investment regarded as a fundamental modular unit are to make the engineering of biology cannot realistically be replicated for every of biology. As such, a gene is capable of faster and more predictable, and to chemical or material to which we would transferring a particular phenotype to the harness the power of biology for the

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 855 common good, the development of new as RNAs and proteins, to perform new approach of studying components in approaches that support the design and functions. Substantial efforts in the field isolation from each other (Alon, 2007). construction of genetic systems has of protein engineering have contributed Systems biology has been associated been a core activity within the field. Al- to the diversity of functions exhibited with new technologies and methods that though advances have been made in by protein components (Dougherty and allow for quantitative measures of com- both areas (fabrication and design), our Arnold, 2009). However, even with these ponents and component interactions ability to construct large genetic systems advances, the diversity of component within biological systems, particularly currently surpasses our ability to design activities that is currently available as those that allow for genome-wide mea- such systems, resulting in a growing parts has been limited, thus limiting surements. In addition, because many of ‘‘design gap’’ that is a critical issue that the design of genetic circuits. Systems these technologies result in large data- synthetic biologists must address. biology may aid in the development of sets, systems biology has also been The ability to synthesize large pieces of effective strategies for generating new associated with computational tools that DNA corresponding to operons, entire component functions by providing infor- support the integration and analysis of pathways, chromosomes, and genomes mation on how Nature has evolved these datasets to identify static relation- in a rapid and predictable way is a key different functions for macromolecules. ships and interactions between compo- approach to system fabrication. Systems A third approach is the predictable nents. Finally, as one of the ultimate goals biology has provided numerous tem- design of complex genetic circuits that of systems biology is to be able to predict plates with the abundance of sequenced lay the foundation for new biological a system’s dynamic behavior from the genomes being deposited daily into devices and systems. Many circuits component parts, computational tools publicly accessible databases. Some designed and built thus far have relied that can model biological systems-level progress has recently been reported, on our fairly detailed knowledge of how function from underlying components including the resynthesis of a bacterial gene transcription is regulated. For are associated with this field. genome and its successful insertion into example, synthetic circuits have applied However, there are currently a number a different bacterial host (Gibson et al., concepts of positive and negative feed- of challenges and limitations facing the 2010). However, it took researchers back to generate systems that sense field of systems biology. Paramount is nearly 15 years and approximately $30 stimuli, remember past events, and pro- determining how to correctly analyze million to develop various fundamental mote cell death in both prokaryotic and and draw valid conclusions from large aspects of this project. Much of this time eukaryotic cells (Burrill and Silver, 2010; amounts of different types of data ranging and cost was methods development that Sprinzak and Elowitz, 2005). However, from genomics and metabolomics to will hopefully reduce the resources many of these systems have been built molecular dynamics in many single cells. needed to carry out such projects in the in a fairly ad hoc manner, requiring sub- Effectively addressing this problem may future. In addition, new high-throughput stantial troubleshooting and iterative require new mathematical and computer methods for large-scale DNA synthesis design to exhibit desired functions, and science approaches. A second key chal- have been recently described (Matzas lack the robust performance standards lenge is knowing what kind of measure- et al., 2010; Norville et al., 2010; Tian one might expect as an engineer. Going ments to make and how accurate these et al., 2009). However, much more work forward, synthetic biologists need to measurements need to be to fully under- is still needed to develop these technolo- better understand the parts underlying stand a biological system. Effectively gies to the point where they are acces- system design, how to predict their func- addressing this challenge will require a sible to the majority of researchers (that tion in a particular genetic context, and re-evaluation of how measurements is, in terms of cost and reliability), and how to predict their integrated function have been made over the past 10 years systems biology may provide important with other system parts (Ellis et al., 2009; in systems biology (for instance, the clues. For example, faster and more reli- Savageau, 2011). This biological under- movement from two-hybrid interactions able ways to synthesize large pieces of standing will then be integrated with com- to mass spectrometry to measure protein DNA may be uncovered by examination putational models to develop computer- interactions). It will also require the devel- of new organisms and thereby reveal aided design tools. opment of even more sensitive strategies new nonchemical methods for DNA to make time-dependent measurements synthesis. What Does Systems Biology Mean inside many cells simultaneously. Taken A second approach is to develop the to Synthetic Biology? together, systems biology is confronted methods to generate new component As with synthetic biology, many different with the problem of both sensitivity and functions that can act as sensors, regula- types of research have been categorized scale. tors, controllers, and enzyme activities, as systems biology. Broadly speaking, Does the ultimate goal of synthetic bio- for example. These components will in systems biology represents an approach logy of the predictable design, construc- turn extend the set of parts from which to biological research that focuses on tion, and characterization of biological synthetic biologists can build genetic the interactions between components of systems rely on findings and approaches devices and systems. Synthetic biologists a biological system and how those inter- from systems biology? Design, analysis, work not only with design of DNA that actions give rise to the dynamic behavior and understanding are integrally linked in encodes genetic circuits but also with of the system in contrast to the more tradi- engineering methodology. Therefore, it is molecular design of biomolecules, such tional molecular biologists’ reductionist reasonable to assume that advances

856 Cell 144, March 18, 2011 ª2011 Elsevier Inc. gained through systems systems biology research. biology in our understanding For example, genome se- of how biological compo- quencing can provide an nents interact to form inte- increased diversity of biolog- grated systems will support ical parts that synthetic biolo- efforts in synthetic biology to gists can use in their gene design engineered biological circuit designs. More impor- systems. However, there is tantly, systems biology will a different viewpoint that provide not just the physical argues that the design princi- parts but a fundamental ples that systems biologists understanding of how these elucidate for natural biolog- components can be inte- ical systems are products of grated effectively with other evolution over many millions components and how biolog- of years and thus are limited ical systems integrate diverse by the history of what came components and regulatory before. It is possible, then, mechanisms to achieve that the design principles robust information transmis- elucidated for natural biolog- sion and behaviors. The ical systems may not be importance of this contribu- necessary or optimal for the tion is highlighted by the engineered systems that syn- limited diversity of parts and thetic biologists may design regulatory mechanisms that from scratch on a computer have been integrated into with less of a restriction of Figure 1. The Challenges and Synergies for Systems and Synthetic synthetic gene circuits to generating new function Biology date, in which the majority through evolutionary proof engineered systems rely cesses and timescales. Both on a limited number of tran- of these views have merit, and the reality take a bottom-up approach, with systems scriptional regulators and do not exhibit is likely somewhere in between—even if biology emphasizing the understanding of robust behaviors over different timescales synthetic biologists design biological biological systems from the underlying and environmental conditions (Elowitz systems to have certain properties that components and synthetic biology and Leibler, 2000; Gardner et al., 2000; are not generally found in natural systems emphasizing building biological systems Purnick and Weiss, 2009). In order to (i.e., optimized for troubleshooting, from modular components. move toward the design of integrated tailoring, reuse, removal, designer identifi- In examining the parallels between the genetic systems, synthetic biologists will cation), a greater understanding of how two fields, it is also useful to examine need to design more sophisticated ge- components interact to form integrated how the key challenges each field is cur- netic circuits that utilize diverse regulatory systems will inform and support the rently facing relate to one another (Fig- strategies (specifically, the integration of design process. ure 1). The challenges synthetic biologists posttranscriptional and posttranslational currently face in engineering genetic mechanisms), balance energetic load, Synergy between Systems systems can be classified as relating to and dynamically modulate system be- and Synthetic Biology either limitations in understanding biolog- havior (Lim, 2010; Win et al., 2009). Although synthetic biology did not directly ical systems or limitations in technical Another important contribution of sys- emerge from systems biology, there are capabilities to study biological systems. tems biology to synthetic biology is asso- important parallels between the two The challenges systems biologists cur- ciated with the technologies and tools for fields. Both systems biology and syn- rently face in understanding biological analyzing biological systems. Synthetic thetic biology represent fundamental systems are related to the complexities biologists often spend the bulk of their shifts in approaches from the fields they associated with studying natural biolog- effort in a design, characterization, and grew out of. Whereas systems biology ical systems and the inadequacies of cur- optimization loop, where original designs represents a shift in the more traditional rent computational models to capture the are modified based on characterization reductionist approach taken in biological physical properties of biological systems. data to achieve the desired system be- research from studying components in We see several areas where these two havior. The tools developed by systems isolation to studying integrated compo- fields can be brought together to effec- biologists to study components in a sys- nents, synthetic biology represents a shift tively address these challenges. tem and their interactions can be applied in emphasizing engineering principles and The richness and complexity of engi- to analyzing synthetic systems and trou- methodology in building biological sys- neered genetic networks, which synthetic bleshooting the system performance. tems from more traditional genetic engi- biologists could build, will be advanced This is particularly true in cases in which neering research. In addition, both fields by using the knowledge gained through the synthetic gene network may have

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 857 unanticipated effects on native pathways matches for the physical model of a cell desired, evolvable behaviors and eventu- in the cell that may in turn affect system (i.e., hard sphere, dilute gas models), ally constructing ecosystems that exhibit behavior. A common example of this chal- such that the ability of current computa- dynamic and predictable behavior lenge can arise in engineering metabolic tional models to capture system behavior patterns. pathways, for which synthetic biologists is limited at best. The potential advances However, it is important to look at can use genome-wide profiling of tran- in constructing genetic systems coming history in thinking about the promises script, protein, or metabolite levels to from synthetic biology research may and risks of synthetic biology. Molecular identify undesired effects of introducing enable systems biologists to shift from biology, and in particular the insertion of the synthetic pathway in the host cell on computational models to physical models foreign genes into microbes, was met critical functions such as redox balance, for their systems by implementing simula- with circumspection by both the public cofactor levels, and stress response tions inside of cells. Specifically, scalable and scientific communities. At the time, (Mukhopadhyay et al., 2008). As another and affordable DNA synthesis technology scientists made promises to the public— example, systems biologists have devel- can allow systems biologists to build for example, the production of human oped a variety of computational tools for many modified versions of natural sys- insulin by engineered bacteria—and modeling biological systems and sharing tems to test their understanding of those delivered on at least some of these prom- information on biological components systems. ises (Villa-Komaroff et al., 1978). So, what across different databases. These tools can we expect from the interplay between will be useful foundations for synthetic Perspective of the Future systems biology and synthetic biology in biologists looking to develop methods to Moving forward, the synergy between the near and long term? In the near term, standardize and share information across synthetic and systems biology will drive we have already seen companies promise component libraries and develop com- transformative advances in biotech- to deliver on new fuels and carbon-based puter-aided design tools for biological nology. The impact includes not only fur- products (such as plastics), and in 5 years systems. ther understanding of the complexity of time this will be a partial reality, thereby Advances in synthetic biology will biological systems but the ability to use starting to take petroleum out of the provide key contributions to systems this information to, for instance, design production loop. We believe that, in biology research by creating new tools better drugs, commodity manufacturing 10 years time, many high-value commod- for interfacing with and manipulating bio- processes, and cell-based therapies ities, including drugs, will be produced logical systems. Research aimed at (Ducat et al., 2011). As one example, the biologically as the result of synthetic understanding a biological system often complexity of biosynthesis processes biology efforts. In the much longer time utilizes methods to perturb or manipulate that can be engineered has been recently frame of 20 to 50 years, we hope that that system and examine the resulting advanced through the integration of a synthetic biology will lead to new cell- behavior of the modified system. Syn- number of pathway construction and based therapies, the expansion of immu- thetic biologists are developing novel optimization tools, including genomic dis- notherapy, synthetic organs and tissues, genetic devices that can be used by covery and engineering (Bayer et al., and rebuilding devastated environments systems biologists to interface with native 2009; Ro et al., 2006; Wang et al., 2009), and ecosystems. networks and precisely probe and manip- in vivo screens for enzyme activity These anticipated futures bring us to ulate those systems. For example, (Pfleger et al., 2006), and enzyme localiza- the controversial areas in synthetic biosynthetic genetic devices have recently tion strategies (Dueber et al., 2009). logy. How do we think about a future been used to rewire signaling pathways Future efforts will focus on the develop- that could involve the reprogramming of and create novel interactions between un- ment of more advanced tools for biopro- entire organisms? Should we consider related cellular components (Culler et al., cess optimization, such as those enabling engineering ecosystems to support sus- 2010; Lim, 2010). In addition, synthetic noninvasive monitoring of pathway flux tainable agriculture, environmental reme- biology can contribute strategies for (Win and Smolke, 2007), closed loop diation, and pathogen removal and to simplifying and isolating biological com- embedded control of biosynthesis system treat human disease? How far should ponents and their interactions through behavior (Dunlop et al., 2010; Farmer and and can we go in reprogramming life to the application of diverse approaches Liao, 2000), and biosynthesis compart- form new types of cells, tissues, and for implementing specific component mentalization and specialization. As ano- entire organisms? These are only some interactions. ther example, systems engineering strat- of the potential benefits and questions Synthetic biology can also provide new egies will play key roles in addressing scientists, engineers, policy makers, simulation platforms for systems biology. current challenges in cellular therapies governments, and, most importantly, the For example, systems biologists currently by enabling the programming of cell-fate public will need to ponder. Molecular develop mathematical models to repre- decisions (Culler et al., 2010), differenti- biologists set standards for safe use of sent the behavior of their systems and ated states (Deans et al., 2007), improved engineered organisms over 30 years use these models to predict the behavior engraftment and targeting (Chen et al., ago. However, as research in synthetic of their systems under different perturba- 2010), and effective kill switches (Callura biology is advancing toward the goals of tions and environments. However, the et al., 2010). Ultimately, researchers will making biology easier to engineer, the development of these models often redesign systems that incorporate evolu- issues of safety and ethical use are being quires assumptions that are imperfect tion—designing gene circuits that exhibit revisited as we write this Essay. In fact,

858 Cell 144, March 18, 2011 ª2011 Elsevier Inc. a recent US government report captures Bayer, T.S., Widmaier, D.M., Temme, K., Mirsky, Lim, W.A. (2010). Nat. Rev. Mol. Cell Biol. 11, many of the critical issues around public E.A., Santi, D.V., and Voigt, C.A. (2009). J. Am. 393–403. 131 benefits and responsible stewardship Chem. Soc. , 6508–6515. Matzas, M., Stahler, P.F., Kefer, N., Siebelt, N., (Presidential Commission for the Study Burrill, D.R., and Silver, P.A. (2010). Cell 140, Boisguerin, V., Leonard, J.T., Keller, A., Stahler, 13–18. C.F., Haberle, P., Gharizaden, B., et al. (2010). of Bioethical Issues, 2010). Nat. Biotechnol. 28, 1291–1294. Although each field could in principle Callura, J.M., Dwyer, D.J., Isaacs, F.J., Cantor, C.R., and Collins, J.J. (2010). Proc. Natl. Acad. Mukhopadhyay, A., Redding, A.M., Rutherford, exist without the other, we instead feel Sci. USA 107, 15898–15903. B.J., and Keasling, J.D. (2008). Curr. Opin. Bio- that the natural interplay between design, 19 Chen, Y.Y., Jensen, M.C., and Smolke, C.D. technol. , 228–234. analysis, and understanding highlights (2010). Proc. Natl. Acad. Sci. USA 107, 8531–8536. Norville, J.E., Derda, R., Drinkwater, K.A., the important relationship between Culler, S.J., Hoff, K.G., and Smolke, C.D. (2010). Leschziner, A.E., and Knight, T.R. (2010). J. Biol. 4 systems biology and synthetic biology. Science 330, 1251–1255. Eng. , 17. Systems biology brings added layers of Pfleger, B.F., Pitera, D.J., Smolke, C.D., and Keas- Deans, T.L., Cantor, C.R., and Collins, J.J. (2007). ling, J.D. (2006). Nat. Biotechnol. 24, 1027–1032. information that will further empower Cell 130, 363–372. Presidential Commission for the Study of Bioeth- future efforts to design synthetic biolog- Dougherty, M.J., and Arnold, F.H. (2009). Curr. ical Issues. (2010). http://www.bioethics.gov/. ical systems. Synthetic biology brings Opin. Biotechnol. 20, 486–491. Purnick, P.E., and Weiss, R. (2009). Nat. Rev. Mol. new technologies and tools that can be Ducat, D.C., Way, J.C., and Silver, P.A. (2011). Cell Biol. 10, 410–422. applied to effectively test our under- Trends Biotechnol. 29, 95–103. Ro, D.K., Paradise, E.M., Ouellet, M., Fisher, K.J., standing of natural biological systems. Dueber, J.E., Wu, G.C., Malmirchegini, G.R., Newman, K.L., Ndungu, J.M., Ho, K.A., Eachus, By integrating the contributions of these Moon, T.S., Petzold, C.J., Ullal, A.V., Prather, R.A., Ham, T.S., Kirby, J., et al. (2006). Nature K.L., and Keasling, J.D. (2009). Nat. Biotechnol. rapidly evolving fields, scientists and 440, 940–943. engineers together will be well positioned 27, 753–759. Savageau, M. (2011). Ann. Biomed. Eng. Published Dunlop, M.J., Keasling, J.D., and Mukhopadhyay, to transform health, well-being, and the online January 4, 2011. 10.1007/s10439-010- A. (2010). Syst. Synth. Biol. 4, 95–104. environment in the years to come. 0220-2. Ellis, T., Wang, X., and Collins, J.J. (2009). Nat. Sprinzak, D., and Elowitz, M.B. (2005). Nature 438, Biotechnol. 27, 465–471. ACKNOWLEDGMENTS 443–448. Elowitz, M.B., and Leibler, S. (2000). Nature 403, Tian, J., Ma, K., and Saaem, I. (2009). Mol. Biosyst. 335–338. P.A.S. is supported by funds from the NIH, DOD, 5, 14–22. 438 DOE, NSF, and the Wyss Institute for Biologically Endy, D. (2005). Nature , 449–453. Villa-Komaroff, L., Efstradiadis, A., Broome, S., Lo- Inspired Engineering. C.D.S. is supported by funds Farmer, W.R., and Liao, J.C. (2000). Nat. Biotech- medico, P., Tizard, R., Naber, S.P., Chick, W.L., from the NIH, NSF, and the Alfred P. Sloan Founda- nol. 18, 533–537. and Gilbert, W. (1978). Proc. Natl. Acad. Sci. USA tion. The authors thank Drew Endy and Jeff Way for Gardner, T.S., Cantor, C.C., and Collins, J.J. 75, 3727–3731. comments. (2000). Nature 403, 339–342. Wang, H.H., Isaacs, F.J., Carr, P.A., Sun, Z.Z., Xu, Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, G., Forest, C.R., and Church, G.M. (2009). Nature 460, 894–898. REFERENCES V.N., Chuang, R.Y., Algire, M.A., Benders, G.A., Montague, M.G., Ma, L., Moodie, M.M., et al. Win, M.N., and Smolke, C.D. (2007). Proc. Natl. 329 104 Alon, U. (2007). An Introduction to Systems (2010). Science , 52–56. Acad. Sci. USA , 14283–14288. Biology: Design Principles of Biological Circuits Haynes, K., and Silver, P.A. (2009). J. Cell Biol. 187, Win, M.N., Liang, J.C., and Smolke, C.D. (2009). (Boca Raton, FL: Chapman and Hall/CRC Press). 589–596. Chem. Biol. 16, 298–310.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 859 Leading Edge Minireview

Boosting Signal-to-Noise in Complex Biology: Prior Knowledge Is Power

Trey Ideker,1,2,* Janusz Dutkowski,1 and Leroy Hood3 1Departments of Medicine and Bioengineering 2Institute for Genomic Medicine University of California, San Diego, La Jolla, CA 92093, USA 3Institute for Systems Biology, Seattle, WA 98103, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.007

A major difficulty in the analysis of complex biological systems is dealing with the low signal- to-noise inherent to nearly all large biological datasets. We discuss powerful bioinformatic concepts for boosting signal-to-noise through external knowledge incorporated in processing units we call filters and integrators. These concepts are illustrated in four landmark studies that have provided model implementations of filters, integrators, or both.

Introduction important insights into how biological systems are constructed Complexity is the grand challenge for science and engineering in and how they function. Second is the availability of data in the 21st century. Complex systems—by definition—have many many complementary layers—including the genome, transcrip- parts in an intricate arrangement that gives rise to seemingly tome, proteome, metabolome, and interactome. A recent wave inexplicable or emergent behaviors. For example, a radio of new bioinformatic methods has demonstrated how both captures an electromagnetic signal and converts it through elec- weapons—strong prior assumptions related to complexity and tronic circuitry into sound that we hear. To most, the radio is systematic accumulation of complementary data—can be a black box with an input (electromagnetic waves) and an output used together or separately to exact substantial increases in (sound waves). However, understanding the inner workings of signal-to-noise. this box requires going head-to-head with the challenges of In what follows, we summarize these developments within complexity. What are the component parts of the system and a general paradigm for signal detection in biology. Central to how are these parts interconnected? How do these connections this paradigm are processing units we call filters and integrators, influence functions and dynamic system outputs? In biology, which draw on prior biological assumptions and complementary ultimately one would like to create models that predict the data to reduce noise and to boost statistical power. To illustrate emergent behaviors of complex entities—and even re-engineer these ideas in context, we review four landmark studies that have these behaviors to humankind’s benefit. provided model implementations of filters and integrators. To decipher complexity, biologists have developed an impressive array of technologies—next-generation sequencing, The Signal Detection Paradigm tandem mass spectrometry, cell-based screening, and so on— Imagine a biological dataset as a stream of information flowing that are capable of generating millions of molecular measure- into a hypothetical signal detection device (Figure 1A). The infor- ments in a single run. This enormous amount of data, however, mation flow is quantized into atomic units or events, representing is typically accompanied by a fundamental problem—an incred- measurements for entities such as genes or proteins, protein ibly low rate of signal-to-noise. For example, the millions of interactions, SNVs, pathways, cells, or individuals. Each event single-nucleotide variants (SNVs) found in a typical genome- contains a certain amount of information, ranging from a single wide association study or by the International Cancer Genome measurement (e.g., strength of protein interaction) to thousands Consortium (Hudson et al., 2010) make it extremely difficult to (e.g., an SNV state or gene expression value over a population of identify which particular SNVs are the true causes of disease. patients). Some events represent true biological signals, with the Due to the overwhelming number of measurements, such anal- definition of ‘‘signal’’ depending exquisitely on the type of results yses either lack power to detect the true signal or must admit the experimentalist is looking for (e.g., an SNV causing disease an unacceptable amount of noise. or a true protein interaction; many examples are given later). Fortunately, biologists have two major weapons with which The remaining events are noise, which can be due to errors signal-to-noise may be improved. First is what we know about that are technical in nature (uncontrollable variation in different complexity, which can and should be used as strong prior instrument readings collected from the same sample) or biolog- assumptions when analyzing biological data. Known principles ical in nature (uncontrollable variation in different samples of complexity such as modularity, hierarchical organization, collected from the same biological condition). An event may evolution, and inheritance (Hartwell et al., 1999) all provide also be considered part of noise even if it is biological and

860 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Boosting Signal with Filters and Integrators To increase signal-to-noise, a pivotal trend in bioinformatics has been to augment the signal detection process with complementary datasets and with prior knowledge about the nature of signal. The vast majority of these approaches fall into either of two categories that we call filters and integrators (Table S1 available online). Filters attempt to cull some events from the information flow immediately and reject them as noise. For example, a detection system for differential expression might reject certain genes immediately if their expression levels fail to exceed a background value in any condition. Integrators, on the other hand, transform the information flow by aggregating individual events into larger units to yield a fundamentally new type of information, or by integrating together different types of information (Hwang et al., 2009). For example, genes might be aggregated into clusters of similar expression or of related function, in which the median levels of the clusters—not their individual genes—are propagated as the ‘‘events’’ on which final accept/reject decisions are made (Park et al., 2007). Importantly, the combining of filters or integrators results in a new device that itself can be recombined with other signal detection systems in a modular fashion. Both filters and integrators influence statistical power and FDR, but by fundamentally different means. Filters reduce the fraction of noise passing through the system and, as a consequence, the FDR. Alternatively, as filters are added, FDR can be held constant by relaxing the decision threshold, resulting in higher statistical power (Figures 1B and 1C). By comparison, integrators combine a train of weak signals into fewer stronger events, leading to an increase in ‘‘effect size’’ and thus a direct increase in statistical power. These methods complement the more classical means of boosting power by increasing the amount of information per event (also called the sample size) (Figure 1A). In each of the following four examples, boosting power with Figure 1. Boosting Signal-to-Noise in Biological Data using Prior a combination of filters and integrators has been critical to the Knowledge (A) Signal detection paradigm in which an input data stream is routed through success of a landmark genome-scale analysis project. a series of filtering and integration units, ending in a statistical test that makes accept or reject decisions. Symbols: m, information per event or sample size; Example 1: Pathway-Level Integration of Genome-wide D t , effect size; a, decision threshold; FDR, false discovery rate. Association Studies (B) Probability distribution P(t) of the test statistic t over the entire data stream of signal plus noise (purple). This distribution is factored into a red signal and Genome-wide association studies (GWAS) seek to identify a blue noise component. FDR and power are visualized in terms of the areas polymorphisms, such as SNVs, that cause a disease or other t under these curves to the right of a. phenotypic trait of interest. Despite the success of this strategy (C) Effect of varying parameters on the signal, noise, and signal plus noise probability distributions. The power is increased by more than 6-fold in mapping SNVs underlying many diseases, the identified loci compared to (B), at an identical FDR. Colors are shown as in (B). typically explain only a small proportion of the heritable variation. (D) MAGENTA, a specific implementation of the signal detection paradigm for For such diseases, one likely explanation is that the genetic pathway-based disease gene mapping as described in Segre` et al. (2010). contribution is distributed over many functionally related loci with large collective impact but with only modest individual effects that do not reach genome-wide significance in single- reproducible, simply because it encodes aspects of phenotype SNV tests (Wang et al., 2010; Yang et al., 2010). irrelevant to the current studies. Based on this hypothesis, Segre` et al. (2010) investigated the To make a decision on which events are signal, the device collective impact of mitochondrial gene variation in type II dia- scores each event and accepts those for which the score betes. They described a method called MAGENTA that performs exceeds a statistically defined decision threshold (Figure 1A). It a meta-analysis of many different GWAS to achieve larger is precisely this decision that becomes problematic in many sample sizes than any single study, thereby increasing statistical large-scale biological studies, in which one either mistakenly power. MAGENTA also includes both filtering and integration rejects a large proportion of the true signal (low statistical power) steps (Figure 1D). First, a filter is applied so that SNVs that fall or must tolerate a high proportion of accepted events that are far from genes are removed. Next an integrator is applied to noise (high false discovery rate or FDR). transform SNVs to genes, such that each gene is assigned a

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 861 score equal to the most significant p value of association among an integrator was used to translate all remaining SNVs into their its SNVs. Gene scores are further corrected for confounding corresponding genes. factors such as gene size, number of SNVs per kilobase, and Using the entire system of filters and integrators under genetic linkage. Finally, a second integrator combines the scores a compound heterozygote recessive model, a total of three across sets of genes assigned to the same biochemical function genes were identified as candidates. One of these (DHODH) or pathway, resulting in a single pathway-level p value of associ- was concurrently shown to be the cause of Miller syndrome. ation. In this way, the family genome sequencing approach used the Simulation studies using MAGENTA suggest a potentially principles of Mendelian genetics (prior knowledge) to correct large boost in power to detect disease associations approximately 70% of the sequencing errors, directly identify (Figure S1A). For example, the method has 50% power to detect rare variants (those present in two or more family members), enrichment for a pathway containing 100 genes of which 10 and reduce enormously the search space for disease traits (cor- genes have weak association to the trait of interest. This perfor- responding to an increase in statistical power from 0.15% to mance is compared to only 10% power to detect any of the 10 33%) (Figure S1B). genes at the single-SNV level. At this increased power, MAGENTA did not identify any mitochondrial pathways as func- Example 3: Assembly of Global Protein Signaling tionally associated with type II diabetes, suggesting that mito- Networks chondria have overall low genetic contribution to diabetes Another area in which filtering and integration are turning out to susceptibility—a surprise given the conventional wisdom about be key is assembly of protein networks. An excellent example the disease. On the other hand, in an independent analysis of of network assembly is provided by the recent work of Breitkreutz genes influencing cholesterol, MAGENTA identified pathways et al. (2010), in which mass spectrometric analysis was used to related to fatty acid metabolism that had been missed by clas- report a high-quality network of 1844 interactions centered on sical GWAS. yeast kinases and phosphatases. Central to the task of network assembly was a signal detection system for quality control and Example 2: Mapping Disease Genes in Complete interpretation of the raw data. The data consisted of a stream Genomes of more than 38,000 proteins that had been coimmunoprecipi- Sequencing and analysis of individual human genomes is one of tated with a different kinase or phosphatase used as bait. Bait the most exciting emerging areas of biology, made possible by proteins can interact both specifically and nonspecifically with the rapid advances in next-generation sequencing (Metzker, a wide variety of peptides, and the nonspecific interactions 2010). As complete genome sequencing becomes pervasive, comprise a major source of noise. To remove nonspecific inter- one of the most important challenges will be to determine how actions, the authors introduced a method called significance such sequences should best be analyzed to map disease genes. analysis of interactome (SAINT), in which each putative interact- The signal filtering and integration paradigm provides an excellent ing protein is assigned a likelihood of true interaction based on its framework for developing methods in this arena. As a landmark number of peptide identifications (representing the amount of example, Roach et al. (2010) described a filtering methodology information per event or sample size) (Figure S2B). After filtering, for disease genes based on the complete genomic sequences the remaining protein interactors are funneled to an integrator of a nuclear family of four. This approach was used to identify stage in which they are clustered into modules based on their just three candidate mutant genes, one of which encoded the overall pattern of interactions (Table S1). Miller syndrome, a rare recessive Mendelian disorder for which The resulting modular interaction network reveals an unprece- both offspring, but neither parent, were affected. dented level of crosstalk between kinase and phosphatase units To begin the analysis, the four genome sequences were during cell signaling. In this network, kinases and phosphatases processed to identify approximately 3.7 million SNVs across are not mere cascades of proteins ordered in a linear fashion. the family. SNVs were then directed through a series of filters Rather, they are more akin to the neurons of a vast neural (Figure S2A). In the first, SNVs were rejected if they were unlikely network, in which each kinase integrates signals from myriad to influence a gene-coding region annotated in the human others, enabling the network to sense cell states, compute func- genome reference map (http://genome.ucsc.edu/), leaving tions of these states, and drive an appropriate cellular response. approximately 1% of SNVs that led to missense or nonsense It is likely that evolution tunes this network, such that some inter- mutations or fell precisely onto splice junctions. A second filter actions dominate and others are minimized in a species-specific removed SNVs that were common in the human population fashion. This might help explain two paradoxical effects seen and thus were unlikely to cause a rare Mendelian disorder. Like pervasively in both signaling and regulation: (1) the same the first one, this filter yielded an approximate 100-fold decrease network across species can be used to control very different in the number of candidates. A third filter was designed to check phenotypes (McGary et al., 2010); and (2) very different networks inheritance patterns, which can be gleaned only from a family of across species can be used to execute near identical responses related genomes. SNVs were removed that had a non-Mendelian (Erwin and Davidson, 2009). pattern of inheritance (result of DNA sequencing errors) or did not segregate as expected for a recessive disease gene, in which Example 4: Filtering Gene Regulatory Networks each affected child must inherit recessive alleles from both using Prior Knowledge parents. This filter yielded another 4- to 5-fold decrease in candi- One of the grand challenges of biology is to decipher the date SNVs versus using only a single parental genome. Finally, networks of transcription factors and other regulatory

862 Cell 144, March 18, 2011 ª2011 Elsevier Inc. components that drive gene expression, phenotypic traits, and applied to problems of complexity inherent in other scientific complex behaviors (Bonneau et al., 2007). Toward this goal, domains, including energy, agriculture, and the environment. probabilistic frameworks such as Bayesian networks have Healthcare and energy will demand significant societal been extensively applied to learn gene regulatory relationships resources moving forward—and hence offer unique opportuni- from mRNA expression data gathered over multiple time points ties to push the development and application of approaches and/or experimental conditions (Friedman, 2004). However, for attacking complexity. due to a limited sample size, large space of possible networks, and probabilistic equivalence of many alternative models, these SUPPLEMENTAL INFORMATION approaches are often unable to find the underlying causal gene relationships. Supplemental Information includes two figures and one table and can be found Recently, Zhu et al. (2008) showed that supplementing with this article online at doi:10.1016/j.cell.2011.03.007. gene expression profiles with complementary information on ACKNOWLEDGMENTS genotypes may help to overcome some of these problems (Figure S2C). These authors sought to assemble a gene regula- We gratefully acknowledge G. Hannum, S. Choi, I. Shmulevich, D. Galas, tory network for the yeast Saccharomyces cerevisiae using J. Roach, and N. Price for helpful comments and feedback. This work was previously published mRNA expression profiles gathered for funded by grants from the National Center for Research Resources 112 yeast segregants. Rather than assemble a Bayesian network (RR031228, T.I., J.D.), the National Institute of General Medical Sciences from expression data alone, the data were first supplemented (GM076547, L.H.; GM070743 and GM085764, T.I.), the Department of Defense (W911SR-07-C-0101, L.H.), and the Luxembourg strategic partner- with the genotypes of each segregant. The combined dataset ship (L.H.). was then analyzed to identify expression quantitative trait loci (eQTL)—genetic loci for which different mutant alleles REFERENCES associate with differences in expression for genes at the same locus (cis-eQTL) or for genes located elsewhere in the genome Bonneau, R., Facciotti, M.T., Reiss, D.J., Schmid, A.K., Pan, M., Kaur, A., (trans-eQTL). The eQTLs were used as a filter to prioritize Thorsson, V., Shannon, P., Johnson, M.H., Bare, J.C., et al. (2007). Cell 131, some gene relations and demote others. Any candidate cause- 1354–1365. effect relations in which the effect gene is near an eQTL were Breitkreutz, A., Choi, H., Sharom, J.R., Boucher, L., Neduva, V., Larsen, B., Lin, 328 removed, as the cis-eQTL already explains the gene expression Z.Y., Breitkreutz, B.J., Stark, C., Liu, G., et al. (2010). Science , 1043–1046. changes at that locus. Conversely, cause-effect relations that Erwin, D.H., and Davidson, E.H. (2009). Nat. Rev. Genet. 10, 141–148. were supported by trans-eQTLs and passed a formal causality Friedman, N. (2004). Science 303, 799–805. test were prioritized. Supplementing gene expression profiles Hartwell, L.H., Hopfield, J.J., Leibler, S., and Murray, A.W. (1999). Nature 402, with genetic information significantly enhanced the power to C47–C52. identify bona fide causal gene relationships. Further improve- Hudson, T.J., Anderson, W., Artez, A., Barker, A.D., Bell, C., Bernabe, R.R., 464 ment was achieved by introducing a second filter that prioritized Bhan, M.K., Calvo, F., Eerola, I., Gerhard, D.S., et al. (2010). Nature , 993–998. cause-effect relations that correspond to measured physical interactions, including data from the many genome-wide chro- Hwang, D., Lee, I.Y., Yoo, H., Gehlenborg, N., Cho, J.H., Petritis, B., Baxter, D., Pitstick, R., Young, R., Spicer, D., et al. (2009). Mol. Syst. Biol. 5, 252. matin immunoprecipitation experiments published for yeast McGary, K.L., Park, T.J., Woods, J.O., Cha, H.J., Wallingford, J.B., and Mar- that document physical interactions between transcription cotte, E.M. (2010). Proc. Natl. Acad. Sci. USA 107, 6544–6549. factors and gene promoters. Metzker, M.L. (2010). Nat. Rev. Genet. 11, 31–46. 8 Summary Park, M.Y., Hastie, T., and Tibshirani, R. (2007). Biostatistics , 212–227. Roach, J.C., Glusman, G., Smit, A.F., Huff, C.D., Hubley, R., Shannon, P.T., Biology is expanding enormously in its ability to decipher Rowen, L., Pant, K.P., Goodman, N., Bamshad, M., et al. (2010). Science complex systems. This ability derives from the expanded power 328, 636–639. to incorporate diverse and complementary data types and to Segre` , A.V., Groop, L., Mootha, V.K., Daly, M.J., and Altshuler, D. (2010). PLoS inject prior understanding of biological principles. Signal detec- Genet. 6, e1001058. tion systems such as those discussed here—along with their Wang, K., Li, M., and Hakonarson, H. (2010). Nat. Rev. Genet. 11, 843–854. filters, integrators, and other components—are leading to funda- Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S., Henders, A.K., Nyholt, D.R., mental new biological discoveries and models, some of which Madden, P.A., Heath, A.C., Martin, N.G., Montgomery, G.W., et al. (2010). Nat. will ultimately transform our understanding of disease and ther- Genet. 42, 565–569. apeutics. It is also likely that many of the strategies, technolo- Zhu, J., Zhang, B., Smith, E.N., Drees, B., Brem, R.B., Kruglyak, L., Bum- gies, and computational tools developed for healthcare can be garner, R.E., and Schadt, E.E. (2008). Nat. Genet. 40, 854–861.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 863 Leading Edge Perspective

Principles and Strategies for Developing Network Models in Cancer

Dana Pe’er1,2,* and Nir Hacohen3,4,5 1Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY 10027, USA 2Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, New York, NY 10032, USA 3Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA 4Center for Immunology and Inﬂammatory Diseases, Massachusetts General Hospital, 149 13th Street, Charlestown, MA 02129, USA 5Department of Medicine, Harvard Medical School, Boston, MA 02115, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.001

The ﬂood of genome-wide data generated by high-throughput technologies currently provides biologists with an unprecedented opportunity: to manipulate, query, and reconstruct functional molecular networks of cells. Here, we outline three underlying principles and six strategies to infer network models from genomic data. Then, using cancer as an example, we describe experimental and computational approaches to infer ‘‘differential’’ networks that can identify genes and processes driving disease phenotypes. In conclusion, we discuss how a network-level understanding of cancer can be used to predict drug response and guide therapeutics.

Cells contain a vast array of molecular structures that come Such a global understanding of networks can have transforma- together to form complex, dynamic, and plastic networks. The tive value, allowing biologists to dissect out the pathways that recent development of high-throughput, massively parallel tech- go awry in disease and then identify optimal therapeutic strate- nologies has provided biologists with an extensive, although still gies for controlling them. incomplete, list of these cellular parts. The emerging challenge To illustrate the potential impact of global models, we note over the next decade is to systematically assemble these that the effect of a cancer drug is often hard to predict because components into functional molecular and cellular networks crosstalk and feedback are still poorly mapped in most and then to use these networks to answer fundamental ques- signaling pathways. For example, the mammalian target of tions about cellular processes and how diseases derail them. rapamycin (mTOR) is critical for cell growth, and its activity is For example, how do these cellular components come together aberrant in most cancers; hence, it was expected to be to robustly maintain homeostasis, process exogenous and a good therapeutic target. Nevertheless, it shows poor results endogenous signals, and then coordinate responses? How do in clinical trials. This deviation from our expectations may be genetic aberrations disrupt the regulatory network and manifest due to feedback and crosstalk between the Akt/mTOR and in disease, such as cancer? In this Perspective, we reason that, the extracellular signal-regulated kinase (ERK) pathways (Carra- even with a partial understanding of molecular networks, biolo- cedo et al., 2008). Inhibition of mTOR releases feedback inhibi- gists are currently poised to understand how networks are de- tion of the receptor tyrosine kinases, which can activate both regulated in cancer cells and then predict how these networks ERK and Akt (O’Reilly et al., 2006) and subsequently increase might respond to drugs. cell proliferation. Quantitative biophysical network models encompassing For targeted therapy to succeed, a global view of the inter- a small number of components have made enormous contribu- connectivity of signaling proteins and their influences is critical. tions to our understanding of cellular networks. However, in In this Perspective, we consider the current state and potential this Perspective, we focus on deriving network models at a large future of data-driven computational approaches to network systems scale from high-throughput data, using ‘‘data-driven inference, with an emphasis on applications to cancer. We will network inference.’’ In this process, a set of modeling assump- describe three principles underlying molecular networks tions are defined, such as ‘‘genetic aberrations alter normal and inferring these from data. These principles are matched cellular regulation and drive tumor proliferation.’’ Then, data to current experimental capabilities and will need revamping are used to derive a specific model, such as specifying for as technological leaps produce new types of data (e.g., more each tumor, which typically harbors many aberrant genes, which quantitative data and with real-time dynamics). We then particular genes drive proliferation. In the end, a ‘‘good’’ model of consider six promising experimental-computational strategies biological networks should be able to predict the behavior of the for constructing network-level models. Though not exhaustive, network under different conditions and perturbations and, these principles and strategies illustrate fruitful directions in ideally, even help us to engineer a desired response. For network biology and will hopefully stimulate discussion and example, where in the molecular network of a tumor should we experimentation among computational and experimental perturb with drug to reduce tumor proliferation or metastasis? biologists.

864 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Principle 1: Molecular Influences Generate Statistical One option is to build qualitative, rather than quantitative, Relations in Data models. These models can identify qualitative features such as Network biology has been empowered by genomics technologies ‘‘Mek (mitogen-activated protein kinase) activates Erk’’ or that that enable the simultaneous measurement of thousands of ‘‘Met4 and Met28 are required together to induce sulfur metab- molecular species. Such data offer a global unbiased view of the olism.’’ If quantitative modeling is important for the problem at entire system, which in turn necessitates computation and statis- hand, linear regression models provide a robust alternative to tics. The key underlying assumption frequently used for inferring nonlinear models (e.g., target gene expression is a linear combi- networks from genomic data is that influences and interactions nation of its transcription factors). Although nonlinear relations between biological entities generate statistical relations in the frequently occur in biology, linear regression models are more observed data. For example, if protein A induces expression of robust, and thus they often give better results, even when the protein B, then we expect to see high levels of protein B whenever underlying model is nonlinear. A detailed molecular model that levels or specific molecular states of its activator A are high. The is exhaustive in its molecular species and in the modeling of their reverse of this logic is that statistical correlation between protein interactions remains beyond our reach for the near future. states indicates a potential interaction between them. In a data- A powerful strategy in systems biology is to abstract and driven manner, a computer can comprehensively test millions of simplify models. In the ‘‘module-network’’ approach (Segal such hypotheses in seconds and provide a statistical score for et al., 2003), genes are grouped into modules that are assumed each candidate molecular interaction or influence. For example, to share a regulatory program. The rationale for this grouping is one can test the statistical association between the DNA copy based on numerous examples in which the same regulatory number of a candidate regulator and gene expression of a target circuits coordinate activation or repression of groups of genes for each locus and gene in the genome (see Strategy 4). that are involved in the same process (e.g., the entire ribosome Various statistical frameworks have been successfully applied complex is regulated by common transcription factors). By pool- to network inference (Basso et al., 2005; Bonneau et al., 2007; ing many similar genes together, the module-network framework Friedman et al., 2000); the commonality between the frameworks significantly increases the statistical power to identify regulatory is that they model a target’s behavior as a function of its regula- influences (Litvin et al., 2009). tors and search for the most predictive regulator set. For example, Bayesian networks were used to reconstruct detailed Principle 2: Networks Are Not Fixed: The Role of Context signaling pathway structures in human T cells using only the and Dynamics concentration of phosphoproteins simultaneously measured in Molecular networks are not static; rather, they exhibit dynamic individual cells (Sachs et al., 2005). Based solely on this data, adaptations in response to both internal states and external this network analysis discovered the majority of known influ- signals. Influences that determine network context can be ences between the measured signaling components without divided into four categories. (1) Genetic background strongly prior knowledge of any pathways. Moreover, the analysis determines network behavior and gives rise to significant uncovered a new point of crosstalk, which was confirmed differences across individuals (and even cells in the special experimentally. case of cancer). (2) Cell lineages have dramatically different The same computational approach and mathematical network structures because of epigenetic changes and differen- formulae correctly reconstructed yeast metabolic networks tial expression of genes. (3) Tissue milieu can reprogram from gene expression data (Pe’er et al., 2001). Together, these networks and their behaviors, as stromal cells do for tumors. studies demonstrate the universal nature of statistical depen- (4) Exogenous signals, such as nutrients and other chemicals, dencies; the same formalism can be used to reconstruct yeast affect networks (Figure 1). Ultimately, health or disease emerges metabolic networks from gene expression data and mammalian from an individual’s integration of internal and external cues. signaling networks from phosphoprotein abundances. In cancer, context can have a profound impact on how Mathematical models of molecular networks have been patients respond to therapies. For example, in recent clinical derived from basic biochemical principles for decades, combin- trials of a new generation of rationally targeted therapies (e.g., ing chemical reaction equations into a quantitative model. For Gleevec, Herceptin, and BRAF inhibitors for chronic myeloge- example, Michaelis Menten equations are frequently used to nous leukemia, breast cancer, and melanoma, respectively), model transcription factor binding to DNA. Nevertheless, most even patients that share the targeted mutation and tumor type contemporary data sets lack the quantitative and statistical displayed substantially variable responses to the drugs (Sharma power to resolve such models, even for small networks. Data- et al., 2010a). In addition, in another recent trial (i.e., phase II), driven approaches typically necessitate hundreds of samples a therapy was extremely effective at reversing tumors in meta- to gain the statistical power to resolve even a partial qualitative static melanoma patients carrying the oncogenic BRAF mutation map of molecular interactions. Data requirements are highly (Flaherty et al., 2010), in which this drug effectively shuts down dependent on the number of components modeled, the mathe- the ERK pathway that is critical for this cancer. Strikingly, matical complexity of the equations representing the molecular however, the same drug leads to the activation of the ERK interactions, and the effect size of the influences themselves. pathway in cells with wild-type BRAF (Poulikakos et al., 2010), Thus, at the heart of data-driven modeling is finding the sweet potentially promoting tumors in these cells. spot in the tradeoff between more realistic (e.g., chemical reac- To gauge such network activity, response, and potential, ex- tion equations) and simpler models that can be inferred more periments must deliberately perturb the cell. For example, blood robustly from data (e.g., linear regression). cells from acute myeloid leukemia patients could not be

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 865 be to generate a model that has a reasonable chance of being able to predict responses to new, previously unmeasured inputs, such as new drugs or combinations of drugs.

Principle 3: Extracting ‘‘Differential’’ Networks Given the importance of context, a central challenge for the field will be to collect data across multiple environments, cell types, and genetic backgrounds using genome-wide profiling to infer network connectivity and function in each context. Rather than explicitly modeling all of the moving parts of a network, we propose that it is feasible to derive models that focus on key components by capturing the essential differences in network wiring, function, and response between contexts (Figure 1). A ‘‘differential-network’’ model is designed to elucidate the following: How do a small number of changes to the network (e.g., genetic, epigenetic) alter the function of the network? At the center of such a model are the altered nodes (i.e., genes or proteins), and data-driven computation can be used to: (1) identify additional components that interact with these altered nodes; (2) qualify and quantify how these interactions are perturbed; and (3) model how these network perturbations continue to propagate though additional components to generate the phenotype of interest, such as proliferation, invasion, or drug response. For example, Carro et al. (2010) identify C/EBPb and STAT3 as ‘‘master’’ transcription factors for which their overexpression synergistically activates expression of mesenchymal genes and subsequent tumor aggressiveness in malignant Figure 1. Differential Networks Explain Phenotypic Variation across glioma (see Strategy 3). Contexts The network model can be significantly simplified because The function of a molecular network is determined by context: genetics, tissue only the components that play a role in the modeled response type, environment (e.g., nutrients), cell-cell communication, and small mole- need identification and inclusion. Importantly, the differential cules. These influences combine to determine the phenotypic response. The ‘‘differential network’’ (colored nodes and edges) models the essential network strategy does not apply only to disease. It can be components that determine how and why a phenotypic response will vary used in any context to address questions such as what is the between contexts. difference between two cell types or how does nutrient status affect cellular behavior? differentiated from healthy cells when only the basal levels of phos- Here, we present six strategies that combine experimental and phorylation of key signaling molecules were measured. Only when computational approaches to generate network inference the samples were interrogated with growth factors and cytokines models. Strategies 1 and 2 focus on identifying key components; did the resulting signaling profiles correlate with tumor genetics, Strategies 3 and 4 focus on deriving key network components drug response, and disease outcome (Irish et al., 2004). The impor- concurrently with their regulatory influences; and Strategies 5 tance of interrogation with stimuli comes into play because many and 6 advance toward increasingly detailed quantitative models important signaling responses, such as ERK2 activation in of network influences. response to epidermal growth factor receptor (EGFR), depend only on fold change, rather than basal protein levels that exhibit Strategy 1: Discovery of Inherited Alleles ahighdegreeofvariance(Cohen-Saidon et al., 2009). and Somatic Mutations Cellular responses often involve multiple feedback loops and Chromosomal aberrations and mutations are a central charac- additional complexities (see Review by Yosef and Regev on teristic of tumor cells. Multiple genetic aberrations collectively page 886 of this issue). For example, the transcriptional influence the expression of thousands of genes, altering the response to EGF stimulation induces feedback attenuation pathways and processes underlying malignant behaviors. The factors, such as dual-specific phosphatases (DUSPs), which emergence of high-resolution copy number assays and shut down the same pathways that activate EGF signaling massively parallel sequencing technologies opens the possibility (Amit et al., 2007). Therefore, to understand tumor network func- of tracing phenotypic differences back to their genetic source. tion, drug response, and the emergence of drug resistance, Large-scale initiatives are currently sequencing thousands of tumors must be systematically interrogated with different stimuli tumor genomes to comprehensively catalog the prevalent and drugs, followed by time series measurements. These sequence mutations and chromosomal aberrations underlying measurements can then be used to derive a model describing each cancer type. Indeed, entire cancer genomes have already the quantitative temporal sequence of events from the initial been sequenced in dozens of tumors, revealing a surprising detection of an input to the tumor’s response. The goal would degree of mutations and chromosomal aberrations in each

866 Cell 144, March 18, 2011 ª2011 Elsevier Inc. individual cancer (Stephens et al., 2009). On the other hand, exon tion greatly facilitates causal gene identification. Taking advan- capture techniques, called exome sequencing (Ng et al., 2010), tage of sequenced genomes, mammalian interference (RNAi) concentrate on the 1% of coding sequence in the human libraries have emerged as a central tool for systematic perturba- genome. This technique enables a more economical cataloging tion of any gene. Indeed, RNAi-based screens have proven to be of coding mutations in cohorts of hundreds of tumors per cancer a major tool in cancer research in which cell lines are readily type. Finally, transcriptome (or RNA) sequencing identifies ex- available and cell proliferation and survival provide surrogates pressed coding and noncoding RNA mutations. Transcriptome of tumorigenesis. sequencing also reveals fusion genes created by intronic trans- In one strategy, unbiased genome-wide RNAi screens in vitro locations, which are therefore undetected by exon sequencing and in vivo are used to identify candidate causative oncogenes techniques (Maher et al., 2009). and tumor suppressors that affect cell proliferation or survival. These large-scale sequencing projects have uncovered a Typically, candidate genes that are found to have an aberrant staggering diversity of genetic aberrations across tumors. sequence mutation, copy number alteration, or expression change Although each individual tumor typically harbors a large number in tumors are usually selected for deeper mechanistic character- of aberrations, only a few play a role in pathogenesis. Therefore, ization (Boehm et al., 2007; Ngo et al., 2010). However, one must distinguishing between genetic changes that promote cancer always keep in mind that candidate genes that are not aberrant progression (i.e., driver mutations) and neutral mutations (i.e., may be equally important to study and target therapeutically. passenger) is like finding needles in haystacks. In a second strategy, candidate genes are first selected from Recurrence was a rule of thumb for copy number aberrations cancer genomic data sets and then validated with small-scale (Weir et al., 2007). Thus, it was unforeseen that only a handful of RNAi screens. For example, this strategy was recently used to genes would recurrently be targeted by sequence mutations in identify critical genes within tumor chromosomal deletions (Ebert each cancer type. The current presumption is that the majority et al., 2008) and for finding the small subset of genes that affect of the driver mutations are unique to each tumor. A key unre- metastasis among hundreds selectively expressed in metastatic solved computational challenge is, therefore, to identify the tumor (Bos et al., 2009). driver mutations associated with each cancer genome. Indeed, Finally, unbiased screens can also shed light on the suscepti- the identification of these drivers is required before a differen- bility or resistance of specific tumors to treatment (Ho¨ lzel et al., tial-network approach can model how the pathogenic behavior 2010) and to find ways to enhance the effects of current thera- emerges. Computational methods addressing this task are still pies, such as taxanes (Whitehurst et al., 2007). Indeed, these under development (Akavia et al., 2010; Beroukhim et al., types of findings can rapidly influence clinical research and prac- 2010; Carter et al., 2009). tice. In all cases, RNAi serves as a ‘‘functional filter’’ to pinpoint Although recurrence may not occur at the gene level, signifi- or annotate genes that affect proliferation, death, metastasis, or cant recurrence does occur at the level of pathways. For any cellular processes. example, in glioblastoma, the majority of tumors have mutations Combining computationally guided experiment design with in each of three signaling pathways: P53, retinoblastoma protein RNAi screens has enormous untapped potential. Although 1 (RB1), and rat sarcoma (RAS)/P13K (Cancer Genome Atlas genome-wide data sets are the most comprehensive, they are Research Network, 2008). Because these findings define path- also expensive to perform at the large scale that is required to ways, rather than genes, as unifying explanations for tumor cover all contexts. A more economical approach is to refine progression, it is clear that finding drivers will rely on knowledge our understanding with iterative cycles of experimentation and of molecular networks. computation. Computational hypotheses derived from one Unfortunately, there is currently insufficient information on data set are used to design the experiments for collecting the pathways in existing databases. First, the majority of signaling next data set (Figure 2). For example, protein interaction maps proteins are not associated with any known pathway. Second, and microarray expression data were used to nominate high like- existing databases include only a small part of what is known lihood genes for characterization in an RNAi screen that dissects and typically do not take context (e.g., cell type) into account. interactions between influenza and its host (Shapira et al., 2009). More sophisticated experimental and computational methods This approach deepened our understanding of how the virus will be needed to define and catalog the components involved manipulates or is controlled by key host defenses through direct in each pathway. A promising direction is the use of systematic and indirect interactions with four major host pathways. experimental and computational approaches to build interaction In the cancer setting, a good network model combined with maps (Amit et al., 2009; Bandyopadhyay et al., 2010), which can computational inferences can suggest which gene combina- subsequently be used to identify key aberrant genes. For tions, genetic background, and cell assay (e.g., proliferation, example, an algorithm known as interactome dysregulation invasion, metabolism) should be matched in searching for new enrichment analysis (IDEA) (Mani et al., 2008) uses a specially components. For example, multiple mutations must occur derived context-specific molecular network to identify key aber- together to produce a tumor (Land et al., 1983), necessitating rant genes in lymphoma. a combinatorial RNAi approach. However, because a large-scale combinatorial RNAi screen is not feasible, computational selec- Strategy 2: Discovering Key Network Components tion of likely combinations renders the experiments feasible. Using RNAi Additionally, although most screens are performed in a single Although naturally occurring genetic alterations help to nominate genetic background, in reality, the functional impact of perturba- causal genes in cancer and other diseases, deliberate perturbation is highly dependent on genetic background: disrupting the

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 867 Figure 2. Experimental Design for Network Inference (A) To comprehensively characterize tumor response to a drug, we suggest profiling a cohort of genetically characterized tumors using multiple technologies, following perturbation with small molecules and RNAi. Then, data-driven algorithms can infer differential network models from these data. The inferred models subsequently guide the design of experiments for the next iteration of data collection. (B) This figure illustrates how different genetic backgrounds and experiments can help to identify driver mutations and network structure. Each identified mutation recurs in a subset of samples, and driver targets are identified by knockdown using RNAi or drug. expression of a gene can cause death in one cell line and have no tional work in this area has been through the analysis of tumor effect in another cell line (Luo et al., 2008). Thus, it would be useful gene expression profiles that have accumulated on the order to select cell lines with informative genetic backgrounds. Finally, of tens of thousands of microarrays over the past decade. Unlike a good model can link genes with specific biological processes the top-down strategies described above, here, the approach (Akavia et al., 2010) and help us efficiently extend RNAi studies is bottom up: first identify the differentially expressed genes to problems of invasion, metabolism, cell-cell interactions, and relevant to a tumor phenotype of interest, and then use these other cancer hallmarks that are poorly understood (Hanahan genes to pinpoint the master regulator that brings about their and Weinberg, 2011). dysregulation. Data-driven approaches (Principle 1) have been particularly Strategy 3: Statistical Identification of Dysregulated powerful at locating the dysregulated genes and regulatory rela- Genes and Their Regulators tions within tumor-related pathways. Analysis of glioblastoma After discovering key network components, the next step is to gene expression profiles using ARACNE (algorithm for the recon- decipher the wiring of the network. The majority of the computa- struction of accurate cellular networks) (Basso et al., 2005)

868 Cell 144, March 18, 2011 ª2011 Elsevier Inc. revealed two master regulators of mesenchymal transformation to phenotype, building a cascade of events from DNA, through in malignant glioma (Carro et al., 2010): the gene module that modulated gene expression, to tumorigenic phenotype. Anchor- corresponds to the mesenchymal transformation and the tran- ing the model at the DNA provided support for causality of influ- scription factors most likely regulating this module (based on ence between driver and module, although this influence can still mutual information between regulator and targets). Both tran- be indirect by a cascade of unknown mechanisms. scription factors were then confirmed experimentally. Though such modeling approaches have only recently taken By extending this statistical reasoning to higher dimensions, hold in cancer genomics, these have been developing in genetic the MINDY (modulator inference by network dynamics) algorithm association for a few years. Chen and colleagues identified gene (Wang et al., 2009) could cleverly identify posttranslational acti- networks that are perturbed by quantitative trait loci (QTL), which vators and inhibitors of master regulators. Based on the assump- in turn lead to metabolic disease (Chen et al., 2008). A single tion that high (or low) expression of such activators (or inhibitors) comprehensive computation locates the QTL, identifies how it would lead to increased (or reduced) coregulation of MYC with perturbs the molecular network, and in turn leads to variation its known targets, MINDY uncovered new posttranslational in disease traits. As more data types that capture the ‘‘state’’ modifiers of MYC in human B lymphocytes, and four of them of the network are collected (e.g., metabolite concentrations were validated using RNAi. Demonstrating the generality of the using mass spectrometry), these differential-network (Principle statistical approach, the identified modifiers were found to act 3) approaches will lead to increasingly mechanistic and causal by diverse mechanisms, including protein turnover, transcription models of disease. complex formation, and selective enzyme recruitment. Although this strategy can be applied to any process or As we wait for the development of experimental technologies disease, cancer is particularly suited for these approaches that detect most posttranslational changes in high throughput, because somatic mutations driving tumorigenesis typically thousands of existing mRNA expression data sets can benefit have a large impact on multiple genes and cellular processes, from this powerful statistical approach to predict key modulators and thus their effect is more easily detected. Disease genes of regulatory activity by any biochemical mechanism. We have based on germline mutations that persist though the powerful thus only begun to tap into the potential of these approaches evolutionary filters are typically more subtle and harder to detect; to uncover the regulatory mechanisms that lead to tumors and indeed, disease is frequently invoked only by the combinatorial other pathogenic phenotypes. Moreover, once profiles of cancer interaction of many genes. proteomes and their posttranslational modifications become As proof of concept of ‘‘personalized medicine’’ and using more readily available, these methods will be dramatically yeast as a model system, CAMELOT (causal modeling with empowered. expression linkage for complex traits) (Chen et al., 2009) integrated genotype and gene expression levels (measured prior Strategy 4: Integrating Genotype and Gene Expression to drug exposure) to quantitatively predict drug sensitivity. into Causal Models Applying a differential network approach, a small number of Current analysis has only scratched the surface of existing data causative genes are identified and then used to build regression sets, and there is critical need for powerful computational models to predict drug response for each yeast strain. The approaches to expose the wealth of hidden information. A prom- algorithm faithfully predicted both the causal genes (24/24 ising approach is ‘‘data integration’’ that builds a model from predictions validated) and drug response. Although epistatic diverse data types (e.g., gene sequence, gene expression relations existed between genes, the statistical simplicity of profiles, and protein-protein interactions), which each shed linear models led to more robust and accurate models from a different light on the underlying biology. The resulting combina- data. We anticipate that a comparable data set from patient tion is more than the sum of the parts (see the MiniReview by tumors (including genotype, basal gene expression, and quanti- Ideker et al. on page 860 of this issue). A natural integration tative drug response) could be used to rationally select each indi- that captures the essence of differential networks is sequence vidual patient’s drug treatment, essentially customizing and opti- and expression. mizing patient care. For example, the CONEXIC (copy number and expression in cancer) algorithm (Akavia et al., 2010) combines DNA copy Strategy 5: Integration of Single Cell Data to Account number with gene expression levels to identify driver mutations for Cell-to-Cell Heterogeneity and predict the processes that they alter. The modeling assump- Whereas the measurements discussed thus far were taken over tions underlying the data integration are: (1) A driver mutation population aggregates using bulk assays, most signal process- should co-vary with a gene module involved in tumorigenesis ing occurs at the level of the individual cell. Over the past (i.e., it assumes that the module’s expression is ‘‘modulated’’ decade, studies have repeatedly demonstrated a large degree by the driver); and (2) Expression levels of the driver control the of heterogeneity between individual cells, even within clonal malignant phenotype rather than copy number (because other populations. This variation arises from differences in protein mechanisms may lead to similar dysregulated expression of concentrations and stochastic fluctuations in biochemical reac- the driver gene). tions involving molecules with low copy numbers. A common This approach predicted two new tumor dependencies in finding is that a response appears dose dependent in bulk melanoma and the processes that they alter. Moreover, these assays but is actually an ‘‘all or nothing’’ response in single cells. predictions were then confirmed using RNAi. CONEXIC thus That is, the intensity of the single cell response remains constant uses gene expression as an intermediary to connect genotype under dose, but the fraction of the cells that respond increases

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 869 with dose (e.g., NF-kB in response to TNFa)(Tay et al., 2010). In Strategy 6: Using Perturbations to Reveal Network these cases, there are a number of distinct subpopulations, and Wiring no individual cell behaves in accordance with the population To infer network models that describe how a network responds average. Such subpopulations confound network inference to stimuli, as well as through what molecular interactions and algorithms when two molecules exhibit statistical dependency mechanisms this sensing and response occurs, comprehensive at the population level but actually reside in mutually exclusive profiles must be measured following perturbations. We consider cells. three methods to perturb the system: RNAi, drugs, and natural Heterogeneity of molecules at the single cell level can have variation. As this strategy is still under development, this section crucial functional impact. Even clonal cell lines treated with is more speculative. drugs under carefully controlled conditions exhibit a large, previ- Measuring network behavior following an RNAi perturbation ously unappreciated degree of variation in cell survival and other uncovers the functions of a gene and provides definitive causal parameters (Cohen et al., 2008). A bulk growth assay can mask links between network components. A key strength of RNAi is a small subpopulation of drug-resistant cells, which can later that it can be used effectively to target any desired gene. form a drug-resistant tumor. Though much debate still exists However, RNAi also has limitations due to its slow kinetics and regarding the origins and emergence of these subpopulations, potential nonspecific cellular responses (e.g., innate immune it is clear that such populations often exist in tumors. For response to double-stranded RNA, overloading of the RNAi example, Sharma and colleagues identified a drug-tolerant state machinery, and off-target effects). Using RNAi-based perturba- that can be transiently acquired and relinquished through revers- tions followed by comprehensive measurements, Amit et al. ible epigenetic changes that occur at low frequency (Sharma (2009) recently developed a network model of transcriptional et al., 2010b). Therefore, to model drug response in tumors, it regulation in the pathogen-sensing response. Candidate regula- is vital to observe the system at the single cell level and take tors and a reduced signature response were first selected from heterogeneity (stochastic, genetic, and microenvironment) into microarray data of cells stimulated with pathogens. Each candi- account. date was then knocked down with RNAi, and the effect on the A unique and beneficial feature of single cell data is the simul- signature was quantified. This strategy uncovered many new taneous observation of multiple signaling proteins in each indi- factors involved in pathogen sensing and generated an informa- vidual cell. The stochastic variation observed across individual tive network wiring diagram that revealed new crosstalk and cells can be harnessed as a data-rich source for network infer- feedback in these pathways. This strategy and its variations ence, in which each of many thousands of cells can be treated should succeed in reconstructing medium-size molecular as an individual sample (Sachs et al., 2005). This strategy networks in other systems. provides significantly more samples than are available in bulk A second perturbation to consider is small molecules, which assays (e.g., each microarray is only a single sample). often have unique and valuable properties for network modeling Nevertheless, this amount of data comes with a technical and direct relevance to patient care. First, in contrast to RNAi tradeoff. To identify interactions and their function, the partici- kinetics, the instantaneous action of small molecules allows for pating signaling proteins need to be measured simultaneously accurate control of both dose and timing, leading to simpler in the same sample. Typically, single cell measurement technol- interpretations of its effects, without the need to consider ogies are limited to a small number of simultaneous channels network adaptation. Second, small molecules can have specific (approximately four to ten channels for flow cytometry and biochemical effects on proteins, leading to elimination of edges approximately three channels for microscopy), with microscopy in the network, rather than entire nodes as RNAi does. By having the unique advantage of real-time tracking across space comprehensive monitoring of the resulting changes in the and time. A promising emerging technology is mass spectrom- network upon drug perturbation, we can refine network models etry-based single cell cytometry (Ornatsky et al., 2008), which and, importantly, discover how pathway activation, crosstalk, currently can measure up to 35 antibodies in a single cell, with and feedback differ across individual tumors with variable levels the potential scale up to 100. This approach will likely break of drug sensitivity. new ground by enabling the study of midscale networks in indi- Third, variation in the DNA across individuals is a powerful vidual cells. We hope and must rely on clever chemists, engi- resource for studying the effects of perturbation on network neers, and physicists to take on this important challenge of function. It is also effective for detecting regulatory interactions, measuring many molecular states in live, single cells over time uncovering complex phenotypes, and inferring networks (Lee and space. et al., 2006). In contrast to deliberate and somewhat dramatic In the meantime, computational approaches can help bridge disruption of a gene’s function through RNAi or drugs, more the gap by: (1) pointing to a small number of key components subtle effects, such as the attenuation or alteration of function, in a differential network, which would be valuable to analyze can be observed in genetically divergent individuals. Natural at the single cell level, and (2) stitching together small, overlap- variation provides us with numerous genetic alterations in ping subnetworks into larger network models (Sachs et al., various combinations, as selected by evolution to produce func- 2009). But there remains a need to develop methods for inte- tional pathways. By monitoring functional pathways in action, grating genomic data sets at the population level with single we can infer how network components work together under cell measurements over small subsets of components at critical different conditions. Each individual’s genetic variation provides network junctures, leading to a more accurate model of the distinct information linking genotype and phenotype and helps to underlying cellular computations. explain network behavior.

870 Cell 144, March 18, 2011 ª2011 Elsevier Inc. What still needs to be developed is an integrated experi- layers that underlie network function. On the computational mental-computational strategy that combines stimulations and end, the key bottleneck is the development of validated perturbations with functional measurements from the same cells computational methods that integrate heterogeneous data and to build network models. Variation in stimuli and environment build differential-network models on a per tumor basis. These allows us to derive what the network is computing, and perturba- methods are required to: (1) identify the genetic aberrations and tions to its components elucidate how the network is computing. the master regulators that drive proliferation, survival, metas- This suggests expanding the framework set forth by Amit and tasis, and drug resistance; (2) model the adaptive/feedback colleagues (Amit et al., 2009) to additional dimensions, including mechanisms that thwart the efficacy of potent drugs; and (3) a time series of gene expression and proteomic measurements, predict additional target pathways for combinatorial drug treat- following each combination of stimuli and perturbations. Natural ment. Based on these predictions, more data can be collected variation between individuals and tumors combined with tar- to refine the models in iterative rounds of computation and geted perturbations using RNAi or drugs will provide particularly experiments. As three-dimensional models of cancer (Ridky powerful data for deriving tumor network models. et al., 2010) continue to develop, we can also profile multiple Executing the experimental design proposed above requires cell types in a tumor environment and model the interactions technological developments. Much of the dynamics occurs at between these. In short, these studies should teach us what the level of proteins and their modifications, raising the need drives cancers and what part of the networks we should target, for high-throughput proteomics to measure protein abundances both initially and after the network adapts and mutates. and activity states. Importantly, the proposed design requires Many of us believe that the ultimate solutions to minimizing assaying a prohibitively large number of samples. To make cancer reside in the regime of combinatorial patient-specific significant progress in the understanding of molecular networks, drug therapy, immunotherapy, and gene therapy. Accurate there is a critical need for the development of more economical quantitative models of tumor networks should predict the effects multiplex functional assays that can measure thousands of of drug perturbations and thus enable sophisticated rational molecular species per sample at low sample cost. An iterative therapy with optimized dosage, timing, and drug combination approach, in which computational modeling with existing data for each individual tumor. Drug combinations can address guides the selection of the next set of experiments, will provide feedback and network adaptation, ensuring shutdown of the the most cost-effective design (Figure 2). necessary pathways. Additionally, drug combinations can target New experimental technologies are rapidly progressing, with distinct subpopulations within a tumor. computational efforts lagging behind. For example, generating Tumor networks are armed with the ability to adapt and rapidly transcriptome sequence reads is easy, but their assembly evolve and, thus, are a powerful adversary. These need to be met remains challenging. To utilize the enormous potential of the with equally sophisticated and flexible therapy regimes that can data types delineated above, significant advances in computa- track these adaptations and dynamically adapt over time, tional modeling are required. Specifically, there is need for a placing us several moves ahead of the tumor. Studying the emer- transition from static and qualitative models to temporal and gence of drug resistance both in vitro (Johannessen et al., 2010) quantitative models. and in vivo can better inform methods to anticipate potential paths of resistance. The ultimate therapies would involve Future: Personalized Cancer Medicine sending ‘‘networks’’ in vivo to track tumor behavior and control Networks govern fundamental processes, such as the develop- the dosage and timing of drug release in response to tumor ment of a multicellular organism from a single cell and communi- behavior. This long-term goal should become feasible as the cation between immune cells in response to a pathogen. fields of network biology, synthetic biology, and appropriate Fueled by technology and computation, research in the coming drug delivery methods mature. In the immediate future, however, decade is expected to unravel the details and principles behind our goal should be to anticipate and monitor real-time changes in diverse molecular networks and how they compute life’s func- the tumor’s network and adapt our therapies accordingly. tions. For example, the ongoing revolution that has enabled the sequencing of individuals provides the first opportunities to ACKNOWLEDGMENTS systematically study and explain how DNA variation results in our phenotypic diversity. Reaching these goals, however, will The authors would like to thank Arnon Arazi, Andrea Califano, William Hahn, also necessitate a deeper understanding of the biophysical prin- Andreja Jovic, Oren Litvin, Neal Rosen, Sagi Shapira, and Cathy Wu for ciples underlying signal processing in small biological circuits valuable comments. The authors would like to thank Oren Litvin for help with and how these come together in systems of increasing size the illustrations. This research was supported by the NIH Director’s New Inno- vator Award Program through grant numbers DP2-OD002414-01 (D.P.) and and complexity. DP2 OD002230 (N.H.), as well as NIAID U54 AI057159 (N.H.). D.P. holds Within cancer research, systems biology is dramatically a Career Award at the Scientific Interface from the Burroughs Wellcome advancing our mechanistic understanding of tumor progression Fund and Packard Fellowship for Science and Engineering. and the design of personalized therapeutics. Continued success, however, will depend on critical advances in both experimental REFERENCES and computational methods. Improvements in tools for measurement—especially mass spectrometry and cost-effective multi- Akavia, U.D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H.C., plex detection—and perturbation—especially RNAi and small Pochanard, P., Mozes, E., Garraway, L.A., and Pe’er, D. (2010). An integrated molecules—will fill in our understanding of the many molecular approach to uncover drivers of cancer. Cell 143, 1005–1017.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 871 Amit, I., Citri, A., Shay, T., Lu, Y., Katz, M., Zhang, F., Tarcic, G., Siwak, D., Inhibition of mutated, activated BRAF in metastatic melanoma. N. Engl. Lahad, J., Jacob-Hirsch, J., et al. (2007). A module of negative feedback regu- J. Med. 363, 809–819. 39 lators defines growth factor signaling. Nat. Genet. , 503–512. Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using Bayesian Amit, I., Garber, M., Chevrier, N., Leite, A.P., Donner, Y., Eisenhaure, T., Gutt- networks to analyze expression data. J. Comput. Biol. 7, 601–620. man, M., Grenier, J.K., Li, W., Zuk, O., et al. (2009). Unbiased reconstruction of Hanahan, D., and Weinberg, R.A. (2011). Hallmarks of cancer: The next gener- a mammalian transcriptional network mediating pathogen responses. Science ation. Cell 144, 646–674. 326, 257–263. Ho¨ lzel, M., Huang, S., Koster, J., Ora, I., Lakeman, A., Caron, H., Nijkamp, W., Bandyopadhyay, S., Chiang, C.Y., Srivastava, J., Gersten, M., White, S., Bell, Xie, J., Callens, T., Asgharzadeh, S., et al. (2010). NF1 is a tumor suppressor in R., Kurschner, C., Martin, C.H., Smoot, M., Sahasrabudhe, S., et al. (2010). neuroblastoma that determines retinoic acid response and disease outcome. A human MAP kinase interactome. Nat. Methods 7, 801–805. Cell 142, 218–229. Basso, K., Margolin, A.A., Stolovitzky, G., Klein, U., Dalla-Favera, R., and Irish, J.M., Hovland, R., Krutzik, P.O., Perez, O.D., Bruserud, O., Gjertsen, B.T., Califano, A. (2005). Reverse engineering of regulatory networks in human and Nolan, G.P. (2004). Single cell profiling of potentiated phospho-protein B cells. Nat. Genet. 37, 382–390. networks in cancer cells. Cell 118, 217–228. Beroukhim, R., Mermel, C.H., Porter, D., Wei, G., Raychaudhuri, S., Donovan, Johannessen, C.M., Boehm, J.S., Kim, S.Y., Thomas, S.R., Wardwell, L., J., Barretina, J., Boehm, J.S., Dobson, J., Urashima, M., et al. (2010). The land- Johnson, L.A., Emery, C.M., Stransky, N., Cogdill, A.P., Barretina, J., et al. scape of somatic copy-number alteration across human cancers. Nature 463, (2010). COT drives resistance to RAF inhibition through MAP kinase pathway 899–905. reactivation. Nature 468, 968–972. Boehm, J.S., Zhao, J.J., Yao, J., Kim, S.Y., Firestein, R., Dunn, I.F., Sjostrom, Land, H., Parada, L.F., and Weinberg, R.A. (1983). Tumorigenic conversion of S.K., Garraway, L.A., Weremowicz, S., Richardson, A.L., et al. (2007). Integra- primary embryo fibroblasts requires at least two cooperating oncogenes. tive genomic approaches identify IKBKE as a breast cancer oncogene. Cell Nature 304, 596–602. 129, 1065–1079. Lee, S.I., Pe’er, D., Dudley, A.M., Church, G.M., and Koller, D. (2006). Identi- Bonneau, R., Facciotti, M.T., Reiss, D.J., Schmid, A.K., Pan, M., Kaur, A., fying regulatory mechanisms using individual variation reveals key role for Thorsson, V., Shannon, P., Johnson, M.H., Bare, J.C., et al. (2007). A predictive chromatin modification. Proc. Natl. Acad. Sci. USA 103, 14062–14067. model for transcriptional control of physiology in a free living cell. Cell 131, 1354–1365. Litvin, O., Causton, H.C., Chen, B.J., and Pe’er, D. (2009). Modularity and interactions in the genetics of gene expression. Proc. Natl. Acad. Sci. USA 106, Bos, P.D., Zhang, X.H., Nadal, C., Shu, W., Gomis, R.R., Nguyen, D.X., Minn, 6441–6446. A.J., van de Vijver, M.J., Gerald, W.L., Foekens, J.A., and Massague´ , J. (2009). Genes that mediate breast cancer metastasis to the brain. Nature 459, 1005– Luo, B., Cheung, H.W., Subramanian, A., Sharifnia, T., Okamoto, M., Yang, X., 1009. Hinkle, G., Boehm, J.S., Beroukhim, R., Weir, B.A., et al. (2008). Highly parallel 105 Cancer Genome Atlas Research Network. (2008). Comprehensive genomic identification of essential genes in cancer cells. Proc. Natl. Acad. Sci. USA , characterization defines human glioblastoma genes and core pathways. 20380–20385. Nature 455, 1061–1068. Maher, C.A., Kumar-Sinha, C., Cao, X., Kalyana-Sundaram, S., Han, B., Jing, Carracedo, A., Ma, L., Teruya-Feldstein, J., Rojo, F., Salmena, L., Alimonti, A., X., Sam, L., Barrette, T., Palanisamy, N., and Chinnaiyan, A.M. (2009). Tran- 458 Egia, A., Sasaki, A.T., Thomas, G., Kozma, S.C., et al. (2008). Inhibition of scriptome sequencing to detect gene fusions in cancer. Nature , 97–101. mTORC1 leads to MAPK pathway activation through a PI3K-dependent feed- Mani, K.M., Lefebvre, C., Wang, K., Lim, W.K., Basso, K., Dalla-Favera, R., and back loop in human cancer. J. Clin. Invest. 118, 3065–3074. Califano, A. (2008). A systems biology approach to prediction of oncogenes Carro, M.S., Lim, W.K., Alvarez, M.J., Bollo, R.J., Zhao, X., Snyder, E.Y., and molecular perturbation targets in B-cell lymphomas. Mol. Syst. Biol. 4 Sulman, E.P., Anne, S.L., Doetsch, F., Colman, H., et al. (2010). The transcrip- , 169. tional network for mesenchymal transformation of brain tumours. Nature 463, Ng, S.B., Buckingham, K.J., Lee, C., Bigham, A.W., Tabor, H.K., Dent, K.M., 318–325. Huff, C.D., Shannon, P.T., Jabs, E.W., Nickerson, D.A., et al. (2010). Exome 42 Carter, H., Chen, S., Isik, L., Tyekucheva, S., Velculescu, V.E., Kinzler, K.W., sequencing identifies the cause of a mendelian disorder. Nat. Genet. , Vogelstein, B., and Karchin, R. (2009). Cancer-specific high-throughput anno- 30–35. tation of somatic mutations: computational prediction of driver missense Ngo, V.N., Young, R.M., Schmitz, R., Jhavar, S., Xiao, W., Lim, K.H., Kohlham- mutations. Cancer Res. 69, 6660–6667. mer, H., Xu, W., Yang, Y., Zhao, H., et al. (2010). Oncogenically active MYD88 Chen, Y., Zhu, J., Lum, P.Y., Yang, X., Pinto, S., MacNeil, D.J., Zhang, C., mutations in human lymphoma. Nature 470, 115–119. Lamb, J., Edwards, S., Sieberts, S.K., et al. (2008). Variations in DNA elucidate O’Reilly, K.E., Rojo, F., She, Q.B., Solit, D., Mills, G.B., Smith, D., Lane, H., molecular networks that cause disease. Nature 452, 429–435. Hofmann, F., Hicklin, D.J., Ludwig, D.L., et al. (2006). mTOR inhibition induces Chen, B.J., Causton, H.C., Mancenido, D., Goddard, N.L., Perlstein, E.O., and upstream receptor tyrosine kinase signaling and activates Akt. Cancer Res. Pe’er, D. (2009). Harnessing gene expression to identify the genetic basis of 66, 1500–1508. drug resistance. Mol. Syst. Biol. 5, 310. Ornatsky, O.I., Lou, X., Nitz, M., Scha¨ fer, S., Sheldrick, W.S., Baranov, V.I., Cohen, A.A., Geva-Zatorsky, N., Eden, E., Frenkel-Morgenstern, M., Issaeva, Bandura, D.R., and Tanner, S.D. (2008). Study of cell antigens and intracellular I., Sigal, A., Milo, R., Cohen-Saidon, C., Liron, Y., Kam, Z., et al. (2008). DNA by identification of element-containing labels and metallointercalators Dynamic proteomics of individual cancer cells in response to a drug. Science using inductively coupled plasma mass spectrometry. Anal. Chem. 80, 322, 1511–1516. 2539–2547. Cohen-Saidon, C., Cohen, A.A., Sigal, A., Liron, Y., and Alon, U. (2009). Pe’er, D., Regev, A., Elidan, G., and Friedman, N. (2001). Inferring subnetworks Dynamics and variability of ERK2 response to EGF in individual living cells. from perturbed expression profiles. Bioinformatics 17 (Suppl 1), S215–S224. 36 Mol. Cell , 885–893. Poulikakos, P.I., Zhang, C., Bollag, G., Shokat, K.M., and Rosen, N. (2010). Ebert, B.L., Pretz, J., Bosco, J., Chang, C.Y., Tamayo, P., Galili, N., Raza, A., RAF inhibitors transactivate RAF dimers and ERK signalling in cells with Root, D.E., Attar, E., Ellis, S.R., and Golub, T.R. (2008). Identification of RPS14 wild-type BRAF. Nature 464, 427–430. 451 as a 5q- syndrome gene by RNA interference screen. Nature , 335–339. Ridky, T.W., Chow, J.M., Wong, D.J., and Khavari, P.A. (2010). Invasive three- Flaherty, K.T., Puzanov, I., Kim, K.B., Ribas, A., McArthur, G.A., Sosman, J.A., dimensional organotypic neoplasia from multiple normal human epithelia. Nat. O’Dwyer, P.J., Lee, R.J., Grippo, J.F., Nolop, K., and Chapman, P.B. (2010). Med. 16, 1450–1455.

872 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., and Nolan, G.P. (2005). matin-mediated reversible drug-tolerant state in cancer cell subpopulations. Causal protein-signaling networks derived from multiparameter single-cell Cell 141, 69–80. data. Science 308, 523–529. Stephens, P.J., McBride, D.J., Lin, M.L., Varela, I., Pleasance, E.D., Simpson, Sachs, K., Itani, S., Carlisle, J., Nolan, G.P., Pe’er, D., and Lauffenburger, D.A. J.T., Stebbings, L.A., Leroy, C., Edkins, S., Mudie, L.J., et al. (2009). Complex (2009). Learning signaling network structures with sparsely distributed data. landscapes of somatic rearrangement in human breast cancer genomes. J. Comput. Biol. 16, 201–212. Nature 462, 1005–1010. Tay, S., Hughey, J.J., Lee, T.K., Lipniacki, T., Quake, S.R., and Covert, M.W. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and (2010). Single-cell NF-kappaB dynamics reveal digital activation and analogue Friedman, N. (2003). Module networks: identifying regulatory modules and information processing. Nature 466, 267–271. their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176. Wang, K., Saito, M., Bisikirska, B.C., Alvarez, M.J., Lim, W.K., Rajbhandari, P., Shen, Q., Nemenman, I., Basso, K., Margolin, A.A., et al. (2009). Genome-wide Shapira, S.D., Gat-Viks, I., Shum, B.O., Dricot, A., de Grace, M.M., Wu, L., identification of post-translational modulators of transcription factor activity in Gupta, P.B., Hao, T., Silver, S.J., Root, D.E., et al. (2009). A physical and regu- human B cells. Nat. Biotechnol. 27, 829–839. latory map of host-influenza interactions reveals pathways in H1N1 infection. Weir, B.A., Woo, M.S., Getz, G., Perner, S., Ding, L., Beroukhim, R., Lin, W.M., Cell 139, 1255–1267. Province, M.A., Kraja, A., Johnson, L.A., et al. (2007). Characterizing the Sharma, S.V., Haber, D.A., and Settleman, J. (2010a). Cell line-based plat- cancer genome in lung adenocarcinoma. Nature 450, 893–898. forms to evaluate the therapeutic efficacy of candidate anticancer agents. Whitehurst, A.W., Bodemann, B.O., Cardenas, J., Ferguson, D., Girard, L., Nat. Rev. Cancer 10, 241–253. Peyton, M., Minna, J.D., Michnoff, C., Hao, W., Roth, M.G., et al. (2007). Sharma, S.V., Lee, D.Y., Li, B., Quinlan, M.P., Takahashi, F., Maheswaran, S., Synthetic lethal screen identification of chemosensitizer loci in cancer cells. McDermott, U., Azizian, N., Zou, L., Fischbach, M.A., et al. (2010b). A chro- Nature 446, 815–819.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 873 Leading Edge Primer

Modeling the Cell Cycle: Why Do Certain Circuits Oscillate?

James E. Ferrell, Jr.,1,2,* Tony Yu-Chen Tsai,1,2 and Qiong Yang1 1Department of Chemical and Systems Biology 2Department of Biochemistry Stanford University School of Medicine, Stanford, CA 94305-5174, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.006

Computational modeling and the theory of nonlinear dynamical systems allow one to not simply describe the events of the cell cycle, but also to understand why these events occur, just as the theory of gravitation allows one to understand why cannonballs ﬂy in parabolic arcs. The simplest examples of the eukaryotic cell cycle operate like autonomous oscillators. Here, we present the basic theory of oscillatory biochemical circuits in the context of the Xenopus embryonic cell cycle. We examine Boolean models, delay differential equation models, and especially ordinary differential equation (ODE) models. For ODE models, we explore what it takes to get oscillations out of two simple types of circuits (negative feedback loops and coupled positive and negative feedback loops). Finally, we review the procedures of linear stability analysis, which allow one to determine whether a given ODE model and a particular set of kinetic parameters will produce oscillations.

In many eukaryotic cells, the cell cycle proceeds as a sequence and S. cerevisiae. This requires the identification of the proteins of contingent events. A new cell must first grow to a sufficient and genes needed for the embryonic cell cycle and the elucida- size before it can begin DNA replication. Then, the cell must tion of the regulatory processes that connect these proteins and complete DNA replication before it can begin mitosis. Finally, genes. Over the past three decades, enormous progress has the cell must successfully organize a metaphase spindle before been made toward these ends. In each case, the cell cycle is it can complete mitosis and begin the cycle again. If cell growth, driven by a protein circuit centered on the cyclin-dependent DNA replication, or spindle assembly is slowed down, the entire protein kinase CDK1 and the anaphase-promoting complex cell cycle slows. Thus, this type of cell cycle is like an ‘‘assembly (APC) (Figure 1A). The activation of CDK1 drives the cell into line’’ or ‘‘succession of dominoes’’ (Hartwell and Weinert, 1989; mitosis, whereas the activation of APC, which generally lags Murray and Kirschner, 1989b). behind CDK1, drives the cell back out (Figure 1B). There are still However, some cell cycles are qualitatively different in terms some missing components and poorly understood connections, of their dynamics. Most notable of these exceptions is the early but overall, the cell-cycle network is fairly well mapped out. embryonic cell cycle in the amphibian Xenopus laevis. DNA repli- But a satisfying understanding of why the CDK1/APC system cation is not contingent upon cell growth, probably because the oscillates requires more than a description of components and frog egg is so big to start with. Mitotic entry is not contingent connections; it requires an understanding of why any regulatory upon completion of DNA replication, and mitotic exit is not circuit would oscillate instead of simply settling down into contingent upon the successful assembly of a metaphase a stable steady state. What types of biochemical circuits can spindle because the relevant checkpoints are ineffective in the oscillate, and what is required of the individual components of context of the embryo’s high cytoplasm:nucleus ratio (Dasso the circuit to permit oscillations? Such insights are provided by and Newport, 1990; Minshull et al., 1994). Lacking these contin- the theory of nonlinear dynamics and by computational gencies, the early embryo simply pulses once every 25 min, modeling. irrespective of whether the endpoints of the cell cycle (DNA Indeed, cell-cycle modeling has become a very popular replication and mitosis) have been completed (Hara et al., pursuit. Hundreds of models have been published (Table 1), 1980). Thus, this cell cycle is clock-like (Murray and Kirschner, beginning with Kauffman, Wille, and Tyson’s prescient proposal 1989b); it behaves as if it is being driven by an autonomous that the cell cycle of the yellow slime mold Physarum biochemical oscillator. polycephalum is driven by a relaxation oscillator (Kauffman and Although many biological processes seem almost unfathom- Wille, 1975; Tyson and Kauffman, 1975). Many of the early ably complex and incomprehensible, oscillators and clocks are models, and a few of the more recent models, were simple, as the types of processes that we might have a good chance of models in physics typically are. They consisted of a small not just describing, but also understanding. Accordingly, much number of ordinary differential equations relating a few time- effort has gone into understanding how simple cell cycles work dependent variables (e.g., protein concentrations or activities) in model systems like Xenopus embryos and the fungi S. pombe to each other and to a few time-independent kinetic parameters.

874 Cell 144, March 18, 2011 ª2011 Elsevier Inc. ODE models, which translate this logic into chemical terms. The basic methods for analyzing ODE models of oscillators are well known in the ﬁeld of nonlinear dynamics but are not so well known among biologists. We believe that it is high time that they were; after all, we biologists are studying what are probably the world’s most interesting nonlinear dynamical systems. We will emphasize the basic concepts of oscillator function and, to the extent possible, keep the algebra to a minimum. For further information, the reader is directed to lucid reviews by Goldbeter (Goldbeter, 2002) and Nova´ k and Tyson (Nova´ k and Tyson, 2008; Tyson et al., 2003), as well as Strogatz’s outstanding textbook (Strogatz, 1994).

Boolean Models We begin by paring the cell cycle down to a simple two-component model in which CDK1 activates APC and APC inactivates CDK1 (Figure 2B). This is the essential negative feedback loop upon which the cell-cycle oscillator is built (Murray et al., 1989). Perhaps the simplest way to think about the dynamics of a system like this is through Boolean or logical analysis (Glass and Kauffman, 1973). Suppose that both CDK1 and APC are perfectly switch-like in their regulation; that is, they are either completely on or completely off. Then, together, the system of CDK1 plus APC

has four possible discrete states (APCon/CDK1on, APCon/ CDK1 , APC /CDK1 , and APC /CDK1 )(Figure 2E). Now Figure 1. Simplified Depiction of the Embryonic Cell Cycle, High- off off on off off lighting the Main Regulatory Loops suppose the system starts in an interphase-like state, with (A) Cyclin-CDK1 is the master regulator of mitosis. APC-Cdc20 is an E3 APCoff/CDK1off. In the first increment of time, what will happen? ubiquityl ligase, which marks mitotic cyclins for degradation by the protea- If the APC is off, then CDK1 turns on. Thus, we define a rule: some. Wee1 is a protein kinase that inactivates cyclin-CDK1. Cdc25 is state 1, with APC /CDK1 , goes to state 2 with APC /CDK1 . a phosphoprotein phosphatase that activates cyclin-CDK1. Not shown here is off off off on Plk1, which cooperates with cyclin-CDK1 in the activation of APC-Cdc20. Next, the active CDK1 activates APC; thus, state 2 goes to 3. The (B) In the Xenopus embryo, the activation of CDK1 drives the cell into mitosis, active APC then inactivates CDK1, and state 3 goes to state 4. whereas the activation of APC, which generally lags behind CDK1, drives the Finally, in the absence of active CDK1, the APC becomes inac- cell back out of mitosis. tive, and state 4 goes to state 1. This completes the cycle. We can depict the dynamics of this oscillator as a diagram in ‘‘state space’’ (Figure 2E). The model goes through a never- ending cycle, and all of the possible states of the system are The purpose of this type of modeling is to understand in simpler, visited during each run through the cycle. albeit more abstract, terms how and why the cell cycle works. If we add one more component to the system—for example, Through time, many of the models have become more compli- a protein like Polo-like kinase 1 (Plk1), which here we assume cated and more like chemical engineering models, consisting of is activated by CDK1 and, in turn, contributes to the activation dozens of variables and regulatory processes. The purpose of of APC (Figure 2C)—then there are eight (2 3 2 3 2) possible this type of modeling is to account for and test our understanding states for the system. If we start with all of the proteins off and of specific details of the system that, because of the complexity of assume six biologically reasonable rules (active CDK1 activates the system, cannot always be understood through intuition. This Plk1, active Plk1 activates APC, active APC inactivates CDK1.), type of detailed model has successfully accounted for the pheno- once again we get a never-ending cycle of states (Figure 2F). But types of dozens of budding yeast mutants (Chen et al., 2004). this time, only some of the possible states (states 1–6 in Both types of modeling have their place in understanding cell- Figure 2F) lie on the cycle. The other two states (7 and 8) feed cycle regulation, and both have their adherents. Modeling into the cycle in a manner determined by the rules we assume. approaches range from simple Boolean modeling to stochastic Thus, no matter where the system starts, it will converge to the modeling and partial differential equation modeling. However, cycle sooner or later. The behavior of this Boolean model is to date, the majority of effort has focused on ordinary differential analogous to ‘‘limit cycle oscillations,’’ which we will encounter equation (ODE) modeling (Table 1), which gets at the basic again in the next section. solution phase biochemistry of cell-cycle regulation. With Boolean models, it is easy to obtain oscillations. Indeed, Here, we address the question of what it takes to make one can even get oscillations from a model with a single species a simple protein circuit like the CDK1/APC system oscillate. (CDK1) that flips on when it is off and flips off when it is on We will start with Boolean modeling, which provides intuition (Figures 2A and 2D), a discrete representation of a protein that into the logic of biochemical oscillators. We then move on to negatively regulates itself.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 875 Table 1. Some Mathematical Models of the Eukaryotic Cell Cycle Year Organism/Cell Type Type of Model Reference 1970 No specific organism ODE (Sel’kov, 1970) 1974 No specific organism ODE (Gilbert, 1974) 1975 Physarum polycephalum ODE (Kauffman and Wille, 1975) 1975 Physarum polycephalum ODE (Tyson and Kauffman, 1975) 1991 Xenopus laevis embryos ODE (Goldbeter, 1991) 1991 Xenopus embryos ODE (Norel and Agur, 1991) 1991 Xenopus embryos, somatic cells ODE (Tyson, 1991) 1992 Xenopus embryos ODE (Obeyesekere et al., 1992) 1993 Xenopus embryos ODE (Novak and Tyson, 1993a) 1993 Xenopus embryos ODE (Novak and Tyson, 1993b) 1994 Xenopus embryos ODE, delay differential equations (Busenberg and Tang, 1994) 1996 Xenopus embryos ODE (Goldbeter and Guilmot, 1996) 1997 S. pombe ODE (Novak and Tyson, 1997) 1998 S. pombe ODE (Novak et al., 1998) 1998 Xenopus embryos ODE (Borisuk and Tyson, 1998) 1999 Mammalian somatic cells ODE (Aguda and Tang, 1999) 2003 Xenopus embryos ODE (Pomerening et al., 2003) 2003 S. cerevisiae ODE (Ciliberto et al., 2003) 2004 S. cerevisiae ODE (Chen et al., 2004) 2004 S. cerevisiae Boolean (Li et al., 2004) 2004 S. pombe Stochastic (Steuer, 2004) 2005 Xenopus embryos ODE (Pomerening et al., 2005) 2006 Mammalian somatic cells Delay differential equations (Srividhya and Gopinathan, 2006) 2006 S. cerevisiae Stochastic (Zhang et al., 2006) 2007 S. cerevisiae Stochastic (Braunewell and Bornholdt, 2007) 2007 S. cerevisiae Stochastic (Okabe and Sasai, 2007) 2007 S. cerevisiae Hybrid (Barberis et al., 2007) 2008 Xenopus embryos ODE (Tsai et al., 2008) 2008 S. cerevisiae Stochastic (Ge et al., 2008) 2008 S. cerevisiae Stochastic (Mura and Csika´ sz-Nagy, 2008) 2008 S. pombe Boolean (Davidich and Bornholdt, 2008) 2008 Mammalian somatic cells ODE (Yao et al., 2008) 2009 Mammalian somatic cells ODE (Alfieri et al., 2009) 2010 S. cerevisiae ODE (Charvin et al., 2010) 2010 S. cerevisiae, S. pombe Boolean (Mangla et al., 2010) 2010 S. pombe ODE (Li et al., 2010)

ODE Models of the CDK1/APC System CDK1 and APC molecules are large, the activation and inactiva- Although Boolean analysis is simple and appealing, it is not tion of CDK1 and APC can be described by a set of differential completely realistic. First, all three Boolean models with negative equations. Here, we will build up an ODE model of the system, feedback loops (Figures 2A–2C) yielded oscillations even though starting with a one-ODE model, which fails to produce we know that real negative feedback loops do not always oscil- oscillations. We then add additional complexity to the ODEs late. The problem is the simplifying assumptions that underpin until the model succeeds in producing sustained, limit cycle Boolean analysis: the discrete activity states and time steps. oscillations. Even if individual CDK1 and APC molecules actually flip between discrete on/off states, a cell contains a number of CDK1 and A One-ODE Model APC molecules, and they would not be expected to all flip By definition, the rate of change of active CDK1 (denoted CDK1*) simultaneously. is the rate of CDK1 activation minus the rate of CDK1 inactiva- The framework for describing the dynamics of such a system tion. For simplicity, we will assume that CDK1 is activated is chemical kinetic theory, and, assuming that the numbers of by the rapid, high-affinity binding of cyclin, which is being

876 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 2. Boolean Models of CDK1 Regulation (A–C) Schematic representation of negative feedback loops composed of one (A), two (B), or three (C) species. (D–F) Trajectories in state space for Boolean models of these three negative feedback systems. Solid lines represent limit cycles; dashed lines (in F) connect the states off the limit cycle to the limit cycle.

synthesized at a constant rate of a1 (Equation 1, blue). For CDK1 step process; multistep processes often yield ultrasensitive, inactivation, we will assume mass action kinetics (Equation 1, sigmoidal responses; and, for our purposes, the Hill equation pink). with a Hill coefﬁcient (n) greater than 1 can be thought of as This gives us the ﬁrst-order differential equation: a generic sigmoidal function. Substituting a Hill function for APC* in Equation 1, we get a one-ODE model of a negative feedback loop: [Equation 1] dCDK1 CDK1n1 = a b CDK1 [Equation 2] dt 1 1 Kn1 + CDK n1 1 1 There are two time-dependent variables, CDK1* and APC*. To allow the system to be described by an ODE with a single We now choose, somewhat arbitrarily, values for the model’s time-dependent variable (Figure 3A), we assume that the activity parameters (a1 = 0.1, b1 =1,K1 = 0.5, n1 = 8) and initial conditions of APC is regulated rapidly enough by CDK1* so that it can be (CDK1*[0] = 0). We can then numerically integrate Equation 2 considered an instantaneous function of CDK1*. What functional over time and see how the concentration of activated CDK1* form should we use for APC’s response function? Here, we will evolves. assume that APC’s response to CDK1* is ultrasensitive— As shown in Figure 3C, the system moves monotonically from sigmoidal in shape, like the response of a cooperative its initial state toward a steady state; there is no hint of oscillation. enzyme—and that the response is described by a Hill function. This monotonic approach to steady state is observed no matter This assumption is reasonable because APC activation is a multi- what we assume for the parameters and initial conditions. Thus,

Figure 3. A Model of CDK1 Regulation with One Differential Equation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, b1 =1,K1 = 0.5, and n1 =8. (B) Trajectories in one-dimensional phase space, approaching a stable steady state (designated by the ﬁlled circle) at CDK1*z0.43. (C) Time course of the system, starting with CDK1*[0] = 0 and evolving toward the steady state.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 877 Figure 4. A Two-ODE Model of CDK1 and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, a2 =3,b1 =3, b2 =1,K1 = 0.5, K2 = 0.5, n1 = 8, and n2 =8. (B) Phase space depiction of the system. The red and green curves are the two nullclines of the system, which can be thought of as the steady- state response curves for the two individual legs of the feedback loop. The ﬁlled black circle at the intersection of the nullclines (with CDK1*z0.42 and APC*z0.37) represents a stable steady state. One trajectory is shown, starting at CDK1*[0] = 0, APC[0] = 0, and spiraling in toward the stable steady state. (C) Time course of the system, showing damped oscillations approaching the steady state.

we have not yet built an oscillator model. Even though we were Now for APC (Equation 4), we assume that its rate of its activa- able to produce sustained oscillations with a one-variable tion by CDK1* is proportional to the concentration of inactive Boolean model of a negative feedback loop (Figures 2A and APC (which, assuming the total concentration of active and inac- 2D), translating the model into a differential equation eliminated tive APC to be constant, we take to be 1 APC) times a Hill the oscillations. function of CDK1*, and the rate of inactivation of APC* is Another way of representing the system’s behavior is through described by simple mass action kinetics. The resulting two- a phase plot, which shows all possible activities of the system. ODE model is: This is similar to the state-space plots that we used for the dCDK1 APCn1 Boolean analysis, but instead of having a few discrete states, = a b CDK1 [Equation 3] dt 1 1 Kn1 + APC n1 the phase plot displays a continuum, showing how the system’s 1 1 transition between states occurs through a smooth continuum (as we would expect, given that the numerous CDK1 molecules dAPC CDK1n2 = a ð1 APCÞ b APC [Equation 4] dt 2 Kn2 + CDK n2 2 do not all activate simultaneously but ‘‘smoothly’’ turn on.). 2 1 The phase plot contains one dimension for each time-dependent variable. Therefore, in this one-variable model, the phase Again, we choose kinetic parameters and initial condition (as plot possesses one axis, representing the concentration of acti- described in the caption to Figure 4) and integrate the ODEs vated CDK1* (Figure 3B). In addition, the system’s phase plot numerically. The results are shown in Figures 4B and 4C. The shows one stable steady state with CDK1z0:43. If the system CDK1 activity initially rises as the system moves from interphase starts off with CDK1 activity less or greater than 0.43, the system (low CDK1 activity) toward M phase (high CDK1 activity) will move along a trajectory back to 0.43. In other words, any (Figure 4C). After a lag, the APC activity begins to rise too. initial condition to the left or right of the steady state yields Then, the rate of CDK1 inactivation (driven by APC activation) a trajectory moving to the right or left, respectively. exceeds the rate of CDK1 activation (driven by cyclin synthesis), and the CDK1 activity starts to fall. After a few wiggles up and down, the system approaches a steady state with intermediate A Two-ODE Model levels of both CDK1 and APC activities. Thus, we have generated Why did the one variable Boolean model produce oscillations damped oscillations, but not sustained oscillations. (Figures 2A and 2D), whereas the one-ODE model (Equation 2) Figure 4B shows the phase space view of these damped oscil- did not (Figure 3)? The discrete time steps of the Boolean model lations. The phase space is now two dimensional because there help to segregate CDK1 activation from inactivation in time. are two time-dependent variables. There is a stable steady state Thus, perhaps adding another ODE (Figure 4A), which acknowl- that sits at the intersection of two curves called the nullclines edges the fact that APC regulation is not instantaneous, might (green and red curves, Figure 4B). These two nullclines can be allow us to generate oscillations. thought of as stimulus-response curves for the two individual First, we write an ODE for the activation and inactivation of legs of the CDK1/APC system. The red nullcline (deﬁned by CDK1 (Equation 3). We once again assume that CDK1 is acti- the equation dCDK1=dt = 0) represents what the steady-state vated by a constant rate of cyclin synthesis (a1). We assume response of CDK1* to constant levels of APC activity would be that the multistep process through which APC* inactivates if there were no feedback from CDK1* to APC* (Figure 4B). The CDK1* is described by a Hill function. The inactivation rate is green nullcline (deﬁned by dAPC=dt = 0) represents what the therefore proportional to the concentration of CDK1* (the steady-state response of APC* to CDK1* would be if there substrate being inactivated) times a Hill function of APC*. were no feedback from APC* to CDK1* (Figure 4B). For the whole

878 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 5. A Three-ODE Model of CDK1, Plk1, and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, a2 =3,a3 =3, b1 =3,b2 =1,b3 =1,K1 = 0.5, K2 = 0.5, K3 = 0.5, n1 =8,n2 = 8, and n3 =8. (B) Phase space depiction of the system. The two colored surfaces are two of the three null surfaces of the system. For clarity, we have omitted the third. The open circle at the intersection of the null surfaces (with CDK1*z0.43, Plk1*z0.42, and APC*z0.37) represents an unstable steady state (or unstable spiral). One trajectory is shown, starting at CDK1*[0] = 0, Plk1[0] = 0, APC[0] = 0, and spiraling in toward the limit cycle. (C) Time course of the system, showing sustained limit cycle oscillations. system to be in steady state, both time derivatives must be zero. help to generate a time delay in the negative feedback, which Thus, the steady state for the entire system lies where the two helps to keep the system from settling into a stable steady state. nullclines intersect. The steady state is stable, and the trajectory of the system (black curve) spirals in from the initial values of Linear Stability Analysis CDK1* and APC* toward the stable steady state (Figure 4B). So far, we have conﬁned ourselves to analyzing ODE models through simulations. This provides an intuitive feel for the A Three-ODE model behavior of a system, but of course, it is never possible to choose Perhaps we can improve the oscillations by adding a third all possible values for the kinetic parameters or all possible initial species to the model, which increases the lag between CDK1 conditions. Is there a way to explain theoretically, rather than activation and APC activation (Figure 4C). Here, we will add computationally, why the one-ODE model failed to oscillate at Plk1 back into the model, as we did in the three-component all, the two-ODE model at best yielded damped oscillations, Boolean model (Figure 2C), with Plk1 assumed to act as an inter- and the three-ODE model ﬁnally yielded sustained oscillations? mediary between CDK1 and APC. We now have three ODEs The answer is yes, and probably the most straightforward (Equations 5–7). The equation for the activation and inactivation approach is ‘‘linear stability analysis.’’ Linear stability analysis is of CDK1 stays the same (Equation 5). The activation of Plk1 by quite remarkable. It assesses the stability of the steady states of CDK1* is proportional to the concentration of inactive Plk1 (1 the system, and, almost magically, allows the dynamics of the Plk1*) times a Hill function of CDK1*, and the inactivation is system tobe characterizedeven when the systemis farfrom steady proportional to Plk1* (Equation 6). A similar logic for the activa- state. To get started with linear stability analysis, we will analyze the tion and inactivation of APC gives Equation 7. steady state of the one-ODE model described in Equation 2. dCDK1 APCn1 = a b CDK1 [Equation 5] Linear Stability Analysis of the One-ODE Model dt 1 1 Kn1 + APCn1 1 For notational simplicity, we will refer to the rate of change of CDK1 (dCDK1=dt)asf. This function f can be thought of as dPlk1 CDK1n2 = a ð1 Plk1Þ b Plk1 [Equation 6] a function of CDK1*, which in turn is a function of time. In terms dt 2 Kn2 + CDK n2 2 2 1 of f, Equation 2 becomes:

n1 n3 CDK1 dAPC Plk1 f = a b CDK 1 1 1 n n [Equation 8] = a3ð1 APC Þ b APC [Equation 7] K 1 + CDK 1 dt Kn3 + Plk n3 3 1 1 3 1 The system will have a steady state when the derivative We arbitrarily choose parameters and initial conditions, and dCDK1=dt equals zero (that is, CDK1* is not changing with eureka! We now have sustained oscillations (Figures 5B and 5C). Moreover, no matter initial conditions, the system eventually respect to time), which means that: approaches the same pattern of oscillations, with CDK1 activity f = 0 [Equation 9] peaking ﬁrst, followed by Plk1 activity and then APC activity (Figure 5C). In the phase plane view, this pattern of oscillations We can calculate the value of CDK1* for which Equation 9 is true is a limit cycle, a closed circle of states that all trajectories spiral either numerically or algebraically. For the parameters used in in or out toward (black curve, Figure 5B). Figure 3, CDK1ssz0:43. To the left of the steady state, f is posi- With Equations 5-7, we ﬁnally have an ODE model of the Xeno- tive (Figure 6); thus, if CDK1* is less than its steady-state value, it pus embryonic cell cycle that exhibits sustained limit cycle oscil- will increase with time. Similarly, if CDK1* is greater than its lations. The key features of this model include the presence of steady-state value, it will decrease with time. This immediately negative feedback, the fact that there are more than two compo- shows that the steady state is stable. With linear stability anal- nents to the negative feedback loop, and the presence of ultrasen- ysis, we can push this further and determine how stable the sitivity in the individual steps of the loop. These last two features steady-state is, in quantitative terms.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 879 dCDK1ðtÞ = dCDK1ð0Þelt [Equation 12]

Thus, to determine the stability of the steady state, one simply needs to determine the value of l by evaluating the derivative df=dCDK1 at the steady state. If l is negative, the steady state is stable and a small perturbation of the system will return exponentially toward the steady state with a half-time of ln 2=l. The bigger the absolute value of l, the faster the system approaches the steady state and, in a sense, the more stable the steady state is. We can now apply linear stability analysis to our one-ODE model (Equation 8). First, we differentiate the right side of the ODE with respect to CDK1*: df b CDK1n1 CDK1n1 + Kn1ð1 + n1Þ = 1 1 2 [Equation 13] dCDK1 CDK n1 + Kn1 1 1 Figure 6. Linear Stability Analysis for the One-ODE Model The blue curve represents f as a function of CDK1*. The dashed red line Next, we evaluate this derivative at CDK1* = CDK*ss. Because all f dCDK * approximates for small values of 1 . kinetic parameters are positive numbers and CDK1*ss is always nonnegative, this derivative always evaluates to a negative Imagine that we perturb the system away from the steady state number (because of the leading negative sign) and the steady by some small increment dCDK1. At what rate will CDK1* move state is stable. For the particular choice of parameters given in back toward CDK1*ss (and dCDK1 move back toward zero)? In Figure 3, lz 1:66. other words, how quickly does the system return to equilibrium? This question can be addressed algebraically with a Taylor Stability in the Two-ODE Model and Three-ODE Model series expansion, but perhaps it is easier to approach graphi- The logic behind linear stability analysis for a two-ODE model is cally. This is set up in Figure 6. The x axis represents the concen- similar; the algebra, though, is more complicated. We start by tration of active CDK1*; the y axis represents the rate of change rewriting Equations 3 and 4, using the shorthand of f and g to of CDK1*, f; and the blue curve depicts how f varies with CDK1*. represent the rates of change of CDK1* and APC*, respectively. When CDK1ssz0:43, the system is at steady state and f = 0. To the left of the steady state, the value of f is positive, and the blue dCDK1 APCn1 = f = a b CDK1 [Equation 14] curve lies above the x axis. To the right of the steady state, the dt 1 1 Kn1 + APC n1 1 1 value of f is negative, and the blue curve lies below the axis. If the system is perturbed from the steady state by dCDK1, dAPC CDK1n2 the rate at which it will return toward the steady state is given = g = a ð1 APCÞ b APC dt 2 Kn2 + CDK n2 2 2 1 by the value of f at CDK1ss + dCDK1 . For small values [Equation 15] ofdCDK1 , we can approximate fðCDK1ss + dCDK1 Þ by dCDK1 times the slope of the dashed red line (Figure 6), which Again, we identify the steady states of the system and consider is the tangent to the blue curve at the steady state. The slope of small perturbations of the system from the steady state. At this the dashed red line is deﬁned to be df=dCDK1 j (the value CDK1ss point, the procedure becomes more complicated. To quantita- of df=dCDK1 at CDK1ss). Therefore, the rate at which CDK1* tively analyze the stability of the system, we cannot simply calcu- goes toward the steady state, which equals the rate at which late one scalar value l at the steady state values (CDK1* , dCDK1 goes toward zero, is given by: ss APCss*) because the two equations are interdependent. Instead, ddCDK1 df we need to calculate eigenvalues of the system at the steady = slope,dCDK1 = ,dCDK1 dt dCDK1 CDK1ss state. Eigenvalues are coefﬁcients—real or complex numbers— [Equation 10] that yield the same information about stability that we got from the value of l in the one-dimensional analysis. For present For notational convenience, we will represent this slope by l. purposes, we will consider them simply as numbers that can be Thus, Equation 10 becomes: calculated through a straightforward procedure (see Box 1). The eigenvalues for the two-ODE model turn out to be complex ddCDK1 = l,dCDK numbers (Box 1). What does that mean? Remember that: dt 1 [Equation 11]

ODEs like Equation 11 show up over and over again in quantita- elt = eðx + iyÞt = exteiyt = extðcos yt + isin ytÞ [Equation 18] tive biology. And, fortunately, it is a particularly simple ODE, probably the only one that most biologists will ever need to solve analytically. Its solution is an exponential function and describes Thus, the real part of l (x in Equation 18) determines whether an exponential approach to steady state: the amplitude of oscillations increases or decreases (i.e.,

880 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Box 1. Obtaining Eigenvalues for the Two-ODE Model Box 2. Obtaining Eigenvalues for the Three-ODE Model

First, we set up the system’s Jacobian matrix A, which is a table of We write the three ODEs as: the two partial derivatives of f and the two partial derivatives of g: dCDK1 APCn1 = f = a b CDK1 [Equation 19] dt 1 1 Kn1 + APCn1 vf vf 1 vCDK vAPC A = 1 [Equation 16] vg vg n2 dPlk1 CDK1 vCDK vAPC = g = a ð1 Plk1 Þ b Plk1 1 dt 2 Kn2 + CDK n2 2 2 1 [Equation 20] Next, we evaluate these four partial derivatives at the steady ab state, yielding a matrix of four numbers, ð Þ. Finally, we n3 cd dAPC Plk1 = h = a ð1 APC Þ b APC dt 3 Kn3 + Plk n3 3 use these four numbers to calculate the eigenvalues. For 3 1 a two-ODE system, the eigenvalues are given by: [Equation 21] pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi t + t2 4D t t2 4D Next, we set up the Jacobian matrix and calculate the partial l1 = ; l2 = 2 2 derivatives: where [Equation 17] 0 1 t = traceðAÞ = a + d vf vf vf D = detðAÞ = ad bc vCDK1 vPlk1 vAPC @ vg vg vg A A = [Equation 22] For our two-ODE model, the eigenvalues turn out to be vCDK1 vPlk1 vAPC vh vh vh l1;2z 0:91 ± 3:30i. vCDK1 vPlk1 vAPC

‘‘dampens’’) over time: if the real part of l is negative, the ampli- Finally, we calculate the three eigenvalues. For the choice of tude of the oscillations will decrease by an exponential decay; parameters we made in Figure 5, the eigenvalues are : ; : + : i; : : i and if the real part of l is positive, the oscillations will grow expo- 5 29 0 88 3 47 0 88 3 47 . nentially over time. The imaginary part of l (y in Equation 18) makes the perturbation oscillate up and down (as sine and cosine functions do). For the parameters that we have chosen cell cycle were simple three-ODE negative feedback loops for our two-ODE system, we have damped oscillations (the real (Goldbeter, 1991). The ability of a model like this to generate parts of the eigenvalues are negative, and the imaginary parts sustained oscillations depends upon the length of the negative are nonzero). And one can show algebraically that, for any choice feedback loop and the amount of ultrasensitivity assumed for of parameters, the real parts of the eigenvalues will be negative, the regulatory interactions within the loop. The longer the loop and the oscillations will be damped. and the more switch-like the interactions, the easier it is to So what about our three-ODE model, which did exhibit sus- produce oscillations. tained oscillations? Again, we carry out a linear stability analysis Negative Feedback with a Time Delay at the steady state of the system (for details, see Box 2). For the As mentioned above, the mechanism through which CDK1 acti- choice of parameters that we made above, the eigenvalues are vates APC is incompletely understood, but it is probably a multi- : ; : + : i; : : i 5 29 0 88 3 47 0 88 3 47 . Therefore, the steady state step mechanism with many intermediate species and ample is unstable because two of the eigenvalues have positive real possibility for the introduction of time delays. The same is true parts and the system exhibits sustained limit cycle oscillations for the inactivation of CDK1 by active APC. Given the vagaries (Figures 5B and 5C). of the exact mechanisms, perhaps a reasonable approach would be to leave the formalism of ODEs and make use instead of delay Summary: Oscillations in ODE Models of Simple differential equations, in which an explicit time delay relates the Negative Feedback Loops change in activity of APC to an earlier activity of CDK1, and Using examples motivated by the cell cycle, we have shown that vice versa. Consider our two-ODE model (Equations 3 and 4), a one-ODE model of a simple negative feedback loop cannot modiﬁed to include two explicit delays, t1 and t2: oscillate; a two-ODE model can exhibit damped, but not sus- dCDK ½t APC½t t n1 tained, oscillations; and a three-ODE model can exhibit sus- 1 1 = a1 b CDK1 ½t dt 1 Kn1 + APC ½t t n1 tained limit cycle oscillations. Linear stability analysis of the 1 1 1 systems’ steady states gave us an explanation for why these [Equation 23] behaviors are found.

From this analysis, we conclude that a simple three-ODE n2 dAPC ½t CDK1 ½t t2 negative feedback model seems like a reasonable starting point = a2ð1 APC ½tÞ b APC ½t dt Kn2 + CDK ½t t n2 2 for describing oscillations in CDK1 activity like those seen in 2 1 2 Xenopus embryos. Indeed, some of the earliest models of the [Equation 24]

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 881 Figure 7. A Delay Differential Equation Model of CDK1 and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, a2 =3,b1 =3, b2 =1,K1 = 0.5, K2 = 0.5, n1 =8,n2 =8,t1 = 0.5, and t2 = 0.5. (B) Phase space depiction of the system. The red and green lines are the nullclines. One trajectory is shown. The initial history for this trajectory was CDK1*[t % 0] = 0, APC[t % 0] = 0. The trajectory spirals in toward a limit cycle. (C) Time course of the system, showing sustained limit cycle oscillations.

Here, the rate of change of CDK1 activity at time t depends on 2001; Gardner et al., 2000; Tyson et al., 2003). The term ‘‘bista-

APC activity at time t- t1, and the rate of change of APC activity ble’’ means that the system can be in either of two alternative, at time t depends on CDK1 activity at time tt2. stable steady states, depending upon its history, and the term This two-equation model now yields sustained limit cycle ‘‘hysteretic’’ means that, once the system has been switched oscillations (Figure 7) once the time delays exceed a fairly small from one state to the other, it tends to stay there. Indeed, exper- critical value. Even a model of a negative feedback loop with only imental studies have shown that, in Xenopus egg extracts, the one delay differential equation can be made to oscillate. The CDK1/Wee1/Cdc25 system does respond to cyclin in a hysteretic explicit time delays, like the discrete time steps in the Boolean fashion; it is easier to maintain an extract in M phase than it is to model (Figure 2), help to keep the activities of CDK1 and APC push an interphase extract into M phase (Pomerening et al., 2003; from settling into a stable steady state. Sha et al., 2003). Thus, mitosis is driven by a bistable trigger. How Delay differential equation models have been used to ratio- would a bistable trigger alter our simple model of the cell cycle? nalize the robust oscillations seen in some synthetic bio- Let us begin again with our two-ODE model (Equations 3 chemical oscillators based on negative feedback loops and 4) but now add an additional positive feedback term (Equa- (Stricker et al., 2008) and have been proposed to model tion 25, yellow) to the ﬁrst equation, accounting for the fact that the embryonic cell cycle in Xenopus as well (Busenberg active CDK1 promotes the formation of more active CDK1 in and Tang, 1994). a highly nonlinear way:

[Equation 25]

Adding a Bistable Trigger dAPC CDK1n2 = a ð1 APCÞ b APC [Equation 26] To this point, we have ignored an important part of the scheme dt 2 Kn2 + CDK n2 2 2 1 shown in Figure 1, the positive feedback loop (CDK1 activates Cdc25, which in turn activates CDK1) and the double-negative Moreover, we assume that the basal rate of CDK1 activation, feedback loop (CDK1 inhibits Wee1, which in turn inhibits CDK1). a1—essentially the cyclin synthesis rate—is slow compared to Nevertheless, this is probably a critical part of the network; every the other activation and inactivation rates. eukaryotic species examined so far has at least one identifiable Now, let us examine the system one leg at a time. First, we look Wee1 homolog, and all eukaryotic species except higher plants at how the steady state APC* activity would vary with CDK1* have at least one Cdc25 homolog. In addition, genetic studies in activity if there were no feedback from APC to CDK1. This S. pombe identified these genes as critical for cell-cycle oscilla- dependency is given by the solution of the equation: tions (Russell and Nurse, 1986, 1987), although, surprisingly, n2 they become less important in S. pombe strains engineered to CDK1 a ð1 APC Þ b APC = 0 [Equation 27] 2 Kn2 + CDK n2 2 run off a single cyclin/Cdk fusion protein (Coudreuse and Nurse, 2 1 2010). Biochemical studies in Xenopus egg extracts and gene disruption studies in human HeLa cells also provide evidence Equation 27 defines one of the nullclines for the two-ODE system that these feedback loops are important for the cell cycle (Pomer- (shown in green in Figure 8B). This nullcline is a monotonic, ening et al., 2005, 2008). What do these positive and double-nega- sigmoidal curve. When CDK1* is low, APC* is low; when tive feedback loops add to the oscillator? CDK1* is high, APC* is high; and in between, APC is intermediate On their own, positive or double-negative feedback loops can in activity. accomplish several things. For example, they can amplify the However, the other nullcline (shown in red in Figure 8B), which magnitude of a signal and can amplify the system’s sensitivity describes how the steady-state activity of CDK1 would vary with to a change in a signal. However, these feedback loops are APC* in the absence of feedback from CDK1 to APC, is qualita- probably best known for their potential to function as bistable, tively different. It is not just sigmoidal, it is S shaped. This means hysteretic toggle switches (Ferrell, 2002; Ferrell and Xiong, that there are three possible steady-state values of CDK1* for

882 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 8. Interlinked Positive and Negative Feedback Loops in a Two-Component Model of CDK1 and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.02, a2 =3,a2 =3, b1 =3,b2 =1,K1 = 0.5, K2 = 0.5, K3 = 0.5, n1 =8, n2 = 8, and n3 =8. (B) Phase space depiction of the system. The red and green lines are the nullclines. They intersect at an unstable steady state designated by the open circle. All trajectories spiral in or out toward a stable limit cycle, denoted by the closed black loop. (C) Time course of the system, showing sustained limit cycle oscillations.

to evolve or may have particular performance advantages that make them especially suitable for biological applica- a given APC activity when APC* is within a certain range tions (Holt et al., 2008; Pomerening et al., 2003; Skotheim (APCz0:35 to 0:5, shown by pink shading in Figure 8B). By et al., 2008; Tsai et al., 2008). applying linear stability analysis to this one-dimensional system Why can this two-ODE system oscillate, whereas the straight (or rate balance analysis, which is an easier way to analyze the negative feedback two-ODE system could not? It is because stability of steady states in one-dimensional systems [Ferrell positive feedback adds a type of time delay to the system, and Xiong, 2001]), one can show that the left and right steady making the ODE model behave more like a delay differential states are stable, and the middle one is an unstable threshold. equation model. The typical response of a system without posi- Thus, we have chosen parameters such that one leg of the tive feedback is a gradual, progressively slowing approach to CDK1/APC system functions like a hysteretic, bistable toggle a steady state. In contrast, a system with positive feedback first switch. As APC* increases, CDK1* decreases toward the edge simmers and then explodes. This simmering phase is essentially of a cliff (at APC*z0.5) and then falls precipitously to a very a time lag, and it facilitates the generation of oscillations. low level. Then, as APC* decreases, CDK1* rises only slightly Accordingly, we expect that the stable steady state seen in the until APC*z0.35, whereupon it shoots sky-high. straight negative feedback two-ODE system (Figure 4) must be When this toggle switch is coupled to a negative feedback destabilized in the positive-plus-negative feedback system loop, the result can be stable limit cycle oscillations, and for (Figure 8). Indeed, this is the case. Linear stability analysis the parameters chosen here, that is what we get (Figures 8B yielded eigenvalues of l1;2z 0:91 ± 3:30i for the negative feed- and 8C). CDK1 activity rises slowly at first and then explodes back-only system; now, with positive feedback added, the upward toward high mitotic levels. This is followed closely by eigenvalues are 1:12 ± 4:77i. The real part of the eigenvalues is a rapid rise in APC*, which changes the rapid rise in CDK1* to positive, so the steady state is unstable; the imaginary part of a similarly precipitous fall. Once CDK1* has fallen enough to the eigenvalues is nonzero, so there are oscillations. turn APC back off, the system begins to slowly ramp up toward At this point, we have an oscillator model composed of two its next spike. The oscillations in CDK1 activity shown in ODEs, representing two interlinked feedback loops. By adding Figure 8C look qualitatively similar to those seen in cycling more ODEs, the model can be made more realistic. For example, Xenopus egg extracts (Murray and Kirschner, 1989a; Murray one could divide the process of CDK1 activation into its two most et al., 1989; Pomerening et al., 2003) and in HeLa cells in culture critical steps: the production of cyclin-CDK1 complexes through (Gavet and Pines, 2010). Accordingly, models that combine the synthesis of cyclin and regulation of the complexes’ activity positive and negative feedback loops have dominated the cell- through phosphorylation and dephosphorylation. This additional cycle modeling field since its beginning (Novak and Tyson, realism comes at the cost of additional complexity; the more 1993a; Tyson and Kauffman, 1975). ODEs, the harder it is to understand why the system behaves However, the oscillations shown in Figure 8C are qualitatively the way that it does. quite different from the oscillations that we observed from our ODE models of simple negative feedback loops (Figure 4 and Conclusion Figure 5). The oscillations for the positive-plus-negative feed- The Xenopus embryonic cell cycle is driven by a protein circuit that back model are spiky, not smooth (Figure 8C), and there are acts like an autonomous oscillator. In this Primer, we set out to distinct slow and fast phases. This is the type of oscillation that explore how oscillations can arise from a protein circuit. We exam- is exhibited by pacemaker cells in a beating heart and by dripping ined three types of models of simple oscillator circuits based on the water faucets, and it is termed a ‘‘relaxation oscillation.’’ CDK1/APC system: Boolean models, ordinary differential equation Biological oscillator circuits often do include positive feedback models, and delay differential equation models. The discrete loops, arguing that relaxation oscillators may be particularly easy character of Boolean models and the time lags introduced into

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 883 delay differential equation models make it relatively easy to the comparative analysis of different biological oscillators, will generate oscillations. For ODE models, it is more difficult to keep allow us to gain insight into the basic design principles of all of the model from settling into a stable steady state. With everything these fascinating clocks. else equal, longer negative feedback loops are easier to get oscillating than shorter ones, and switch-like, ultrasensitive ACKNOWLEDGMENTS response functions within the negative feedback loop promote We thank Markus Covert for the idea of starting this tutorial with a Boolean oscillations, as well. Adding a positive feedback loop to a negative analysis instead of plunging right into ODEs and David Dill for helpful discus- feedback loop tends to promote oscillations, and oscillators with sions. This work was supported by NIH GM077544. this bistable trigger have distinct characteristics that might make them particularly suitable for biological systems. REFERENCES Linear stability analysis addresses why one ODE model oscillates and another one does not. Accordingly, we have presented Aguda, B.D., and Tang, Y. (1999). The kinetic origins of the restriction point in several examples of stability analysis for simple oscillator the mammalian cell cycle. Cell Prolif. 32, 321–335. circuits. For one-ODE systems, linear stability analysis is fairly Alfieri, R., Barberis, M., Chiaradonna, F., Gaglio, D., Milanesi, L., Vanoni, M., simple. For two or more ODEs, however, one must make use Klipp, E., and Alberghina, L. (2009). Towards a systems biology approach to of matrix algebra manipulations, calculating the eigenvalues of mammalian cell cycle: modeling the entrance into S phase of quiescent fibroblasts after serum stimulation. BMC Bioinformatics 10 (Suppl 12), S16. the system at the steady state(s). This takes some effort, but Barberis, M., Klipp, E., Vanoni, M., and Alberghina, L. (2007). Cell size at S the effort is worth it—it provides us with an understanding of phase initiation: an emergent property of the G1/S network. PLoS Comput. why a circuit does or does not oscillate. Biol. 3, e64. In many eukaryotic cells, the cell cycle is driven by a CDK1/ Borisuk, M.T., and Tyson, J.J. (1998). Bifurcation analysis of a model of mitotic APC circuit that behaves more like a succession of decisions control in frog eggs. J. Theor. Biol. 195, 69–85. or contingent events rather than an autonomous oscillator. Braunewell, S., and Bornholdt, S. (2007). Superstability of the yeast cell-cycle Nevertheless, simple models of the Xenopus oscillator, such as dynamics: ensuring causality in the presence of biochemical stochasticity. the ones discussed here, provide insight that informs the under- J. Theor. Biol. 245, 638–643. standing of more complex cell-cycle circuits. Just as positive Busenberg, S., and Tang, B. (1994). Mathematical models of the early embry- feedback loops can provide an oscillator circuit with robustness, onic cell cycle: the role of MPF activation and cyclin degradation. J. Math. Biol. positive feedback loops can be used to build a succession of reli- 32, 573–596. able switches. We suspect that the link between clock-like cell Charvin, G., Oikonomou, C., Siggia, E.D., and Cross, F.R. (2010). Origin of 8 cycles (like the Xenopus embryonic cycle) and domino-like cell irreversibility of cell cycle start in budding yeast. PLoS Biol. , e1000284. cycles (like the somatic cell cycle) is that they are both con- Chen, K.C., Calzone, L., Csikasz-Nagy, A., Cross, F.R., Novak, B., and Tyson, structed out of bistable switches. J.J. (2004). Integrative analysis of cell cycle control in budding yeast. Mol. Biol. Cell 15, 3841–3862. It is clear that the Xenopus embryonic cell cycle can operate in Ciliberto, A., Novak, B., and Tyson, J.J. (2003). Mathematical model of the the absence of transcription. Therefore, we have regarded the morphogenesis checkpoint in budding yeast. J. Cell Biol. 163, 1243–1254. cell-cycle oscillator as only a protein circuit—proteins regulate Coudreuse, D., and Nurse, P. (2010). Driving the cell cycle with a minimal CDK other proteins, but not gene expression. Nonetheless, many cell- control network. Nature 468, 1074–1079. cycle regulators in many cell types undergo periodic transcription Dasso, M., and Newport, J.W. (1990). Completion of DNA replication is moni- (Spellman et al., 1998). Indeed, in budding yeast, transcriptional tored by a feedback system that controls the initiation of mitosis in vitro: oscillations persist in the absence of CDK1 oscillations (Haase studies in Xenopus. Cell 61, 811–823. and Reed, 1999; Orlando et al., 2008). Transcriptional regulation Davidich, M.I., and Bornholdt, S. (2008). Boolean network model predicts cell undoubtedly contributes to the overall functioning of the cell-cycle cycle sequence of fission yeast. PLoS One 3, e1672. oscillator, with the protein oscillator acting as a basic core circuit Ferrell, J.E., Jr. (2002). Self-perpetuating states in signal transduction: positive upon which additional controls have been layered. feedback, double-negative feedback and bistability. Curr. Opin. Cell Biol. 14, The same may be true of another well-studied biological oscil- 140–148. lator, the circadian clock. The slow pace of the circadian clock Ferrell, J.E., Jr., and Xiong, W. (2001). Bistability in cell signaling: How to make makes it natural to think of the clock as arising from a transcrip- continuous processes discontinuous, and reversible processes irreversible. 11 tional gene circuit. Nevertheless, in cyanobacteria (Nakajima Chaos , 227–236. et al., 2005; Rust et al., 2007; Tomita et al., 2005), Ostreococcus Gardner, T.S., Cantor, C.R., and Collins, J.J. (2000). Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339–342. (O’Neill et al., 2011), and human red blood cells (O’Neill and Reddy, 2011), circadian oscillations can proceed in the absence Gavet, O., and Pines, J. (2010). Progressive activation of CyclinB1-Cdk1 coordinates entry to mitosis. Dev. Cell 18, 533–543. of transcription. Perhaps core protein circuits constitute the Ge, H., Qian, H., and Qian, M. (2008). Synchronized dynamics and non- basic circadian clock, with transcriptional circuits reinforcing equilibrium steady states in a stochastic yeast cell-cycle network. Math. and refining the clock’s behavior (Zwicker et al., 2010). Biosci. 211, 132–152. In any case, whether one is interested in gene circuits or Gilbert, D.A. (1974). The nature of the cell cycle and the control of cell prolifer- protein circuits, and in cell-cycle oscillations or circadian oscilla- ation. Curr. Mod. Biol. 5, 197–206. tions, the basic concepts and tools that we have reviewed here— Glass, L., and Kauffman, S.A. (1973). The logical analysis of continuous, nonnegative feedback loops, bistable triggers, time lags, and linear linear biochemical control networks. J. Theor. Biol. 39, 103–129. stability analysis—should prove helpful. Our hope is that the Goldbeter, A. (1991). A minimal cascade model for the mitotic oscillator detailed analysis of particular oscillator circuits, coupled with involving cyclin and cdc2 kinase. Proc. Natl. Acad. Sci. USA 88, 9107–9111.

884 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Goldbeter, A. (2002). Computational approaches to cellular rhythms. Nature Orlando, D.A., Lin, C.Y., Bernard, A., Wang, J.Y., Socolar, J.E., Iversen, E.S., 420, 238–245. Hartemink, A.J., and Haase, S.B. (2008). Global control of cell-cycle transcrip- 453 Goldbeter, A., and Guilmot, J.M. (1996). Arresting the mitotic oscillator and tion by coupled CDK and network oscillators. Nature , 944–947. the control of cell proliferation: insights from a cascade model for cdc2 kinase Pomerening, J.R., Kim, S.Y., and Ferrell, J.E., Jr. (2005). Systems-level dissec- activation. Experientia 52, 212–216. tion of the cell-cycle oscillator: bypassing positive feedback produces 122 Haase, S.B., and Reed, S.I. (1999). Evidence that a free-running oscillator damped oscillations. Cell , 565–578. drives G1 events in the budding yeast cell cycle. Nature 401, 394–397. Pomerening, J.R., Sontag, E.D., and Ferrell, J.E., Jr. (2003). Building a cell Hara, K., Tydeman, P., and Kirschner, M. (1980). A cytoplasmic clock with the cycle oscillator: hysteresis and bistability in the activation of Cdc2. Nat. Cell 5 same period as the division cycle in Xenopus eggs. Proc. Natl. Acad. Sci. USA Biol. , 346–351. 77, 462–466. Pomerening, J.R., Ubersax, J.A., and Ferrell, J.E., Jr. (2008). Rapid cycling and Hartwell, L.H., and Weinert, T.A. (1989). Checkpoints: controls that ensure the precocious termination of G1 phase in cells expressing CDK1AF. Mol. Biol. 19 order of cell cycle events. Science 246, 629–634. Cell , 3426–3441. + Holt, L.J., Krutchinsky, A.N., and Morgan, D.O. (2008). Positive feedback Russell, P., and Nurse, P. (1986). cdc25 functions as an inducer in the mitotic 45 sharpens the anaphase switch. Nature 454, 353–357. control of fission yeast. Cell , 145–153. Kauffman, S., and Wille, J.J. (1975). The mitotic oscillator in Physarum polyce- Russell, P., and Nurse, P. (1987). Negative regulation of mitosis by wee1+, 49 phalum. J. Theor. Biol. 55, 47–93. a gene encoding a protein kinase homolog. Cell , 559–567. Li, B., Shao, B., Yu, C., Ouyang, Q., and Wang, H. (2010). A mathematical Rust, M.J., Markson, J.S., Lane, W.S., Fisher, D.S., and O’Shea, E.K. (2007). model for cell size control in fission yeast. J. Theor. Biol. 264, 771–781. Ordered phosphorylation governs oscillation of a three-protein circadian clock. Science 318, 809–812. Li, F., Long, T., Lu, Y., Ouyang, Q., and Tang, C. (2004). The yeast cell-cycle network is robustly designed. Proc. Natl. Acad. Sci. USA 101, 4781–4786. Sel’kov, E.E. (1970). [2 alternative autooscillatory stationary states in thiol metabolism—2 alternative types of cell multiplication: normal and neoplastic]. Mangla, K., Dill, D.L., and Horowitz, M.A. (2010). Timing robustness in the Biofizika 15, 1065–1073. budding and fission yeast cell cycles. PLoS ONE 5, e8906. Sha, W., Moore, J., Chen, K., Lassaletta, A.D., Yi, C.S., Tyson, J.J., and Sible, Minshull, J., Sun, H., Tonks, N.K., and Murray, A.W. (1994). A MAP kinase-depen- J.C. (2003). Hysteresis drives cell-cycle transitions in Xenopus laevis egg dent spindle assembly checkpoint in Xenopus egg extracts. Cell 79, 475–486. extracts. Proc. Natl. Acad. Sci. USA 100, 975–980. Mura, I., and Csika´ sz-Nagy, A. (2008). Stochastic Petri Net extension of a yeast Skotheim, J.M., Di Talia, S., Siggia, E.D., and Cross, F.R. (2008). Positive feed- cell cycle model. J. Theor. Biol. 254, 850–860. back of G1 cyclins ensures coherent cell cycle entry. Nature 454, 291–296. Murray, A.W., and Kirschner, M.W. (1989a). Cyclin synthesis drives the early Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., embryonic cell cycle. Nature 339, 275–280. Brown, P.O., Botstein, D., and Futcher, B. (1998). Comprehensive identifica- Murray, A.W., and Kirschner, M.W. (1989b). Dominoes and clocks: the union of tion of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by two views of the cell cycle. Science 246, 614–621. microarray hybridization. Mol. Biol. Cell 9, 3273–3297. Murray, A.W., Solomon, M.J., and Kirschner, M.W. (1989). The role of cyclin Srividhya, J., and Gopinathan, M.S. (2006). A simple time delay model for synthesis and degradation in the control of maturation promoting factor eukaryotic cell cycle. J. Theor. Biol. 241, 617–627. 339 activity. Nature , 280–286. Steuer, R. (2004). Effects of stochasticity in models of the cell cycle: from quan- Nakajima, M., Imai, K., Ito, H., Nishiwaki, T., Murayama, Y., Iwasaki, H., tized cycle times to noise-induced oscillations. J. Theor. Biol. 228, 293–301. Oyama, T., and Kondo, T. (2005). Reconstitution of circadian oscillation of Stricker, J., Cookson, S., Bennett, M.R., Mather, W.H., Tsimring, L.S., and 308 cyanobacterial KaiC phosphorylation in vitro. Science , 414–415. Hasty, J. (2008). A fast, robust and tunable synthetic gene oscillator. Nature Norel, R., and Agur, Z. (1991). A model for the adjustment of the mitotic clock 456, 516–519. 251 by cyclin and MPF levels. Science , 1076–1078. Strogatz, S.H. (1994). Nonlinear dynamics and chaos: with applications to Novak, B., Csikasz-Nagy, A., Gyorffy, B., Chen, K., and Tyson, J.J. (1998). Math- physics, biology, chemistry, and engineering (Cambridge, MA: Westview Press). ematical model ofthe fission yeast cellcycle withcheckpoint controlsat the G1/S, Tomita, J., Nakajima, M., Kondo, T., and Iwasaki, H. (2005). No transcription- 72 G2/M and metaphase/anaphase transitions. Biophys. Chem. ,185–200. translation feedback in circadian rhythm of KaiC phosphorylation. Science Novak, B., and Tyson, J.J. (1993a). Modeling the cell division cycle: M-phase 307, 251–254. 165 trigger, oscillations, and size control. J. Theor. Biol. , 101–134. Tsai, T.Y., Choi, Y.S., Ma, W., Pomerening, J.R., Tang, C., and Ferrell, J.E., Jr. Novak, B., and Tyson, J.J. (1993b). Numerical analysis of a comprehensive (2008). Robust, tunable biological oscillations from interlinked positive and model of M-phase control in Xenopus oocyte extracts and intact embryos. negative feedback loops. Science 321, 126–129. 106 J. Cell Sci. , 1153–1168. Tyson, J.J. (1991). Modeling the cell division cycle: cdc2 and cyclin interac- Novak, B., and Tyson, J.J. (1997). Modeling the control of DNA replication in tions. Proc. Natl. Acad. Sci. USA 88, 7328–7332. 94 fission yeast. Proc. Natl. Acad. Sci. USA , 9147–9152. Tyson, J., and Kauffman, S. (1975). Control of mitosis by a continuous Nova´ k, B., and Tyson, J.J. (2008). Design principles of biochemical oscillators. biochemical oscillation: Synchronization; spatially inhomogeneous oscilla- Nat. Rev. Mol. Cell Biol. 9, 981–991. tions. J. Math. Biol. 1, 289–310. O’Neill, J.S., and Reddy, A.B. (2011). Circadian clocks in human red blood Tyson, J.J., Chen, K.C., and Novak, B. (2003). Sniffers, buzzers, toggles and cells. Nature 469, 498–503. blinkers: dynamics of regulatory and signaling pathways in the cell. Curr. 15 O’Neill, J.S., van Ooijen, G., Dixon, L.E., Troein, C., Corellou, F., Bouget, F.Y., Opin. Cell Biol. , 221–231. Reddy, A.B., and Millar, A.J. (2011). Circadian rhythms persist without tran- Yao, G., Lee, T.J., Mori, S., Nevins, J.R., and You, L. (2008). A bistable Rb-E2F scription in a eukaryote. Nature 469, 554–558. switch underlies the restriction point. Nat. Cell Biol. 10, 476–482. Obeyesekere, M.N., Tucker, S.L., and Zimmerman, S.O. (1992). Mathematical Zhang, Y., Qian, M., Ouyang, Q., Deng, M., Li, F., and Tang, C. (2006). models for the cellular concentrations of cyclin and MPF. Biochem. Biophys. A stochastic model of the yeast cell cycle network. Physica. D 219, 35–39. 184 Res. Commun. , 782–789. Zwicker, D., Lubensky, D.K., and ten Wolde, P.R. (2010). Robust circadian Okabe, Y., and Sasai, M. (2007). Stable stochastic dynamics in yeast cell clocks from coupled protein-modification and transcription-translation cycles. cycle. Biophys. J. 93, 3451–3459. Proc. Natl. Acad. Sci. USA 107, 22540–22545.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 885 Leading Edge Review

Cellular Decision Making and Biological Noise: From Microbes to Mammals

Ga´ bor Bala´ zsi,1 Alexander van Oudenaarden,2 and James J. Collins3,4,* 1Department of Systems Biology–Unit 950, The University of Texas MD Anderson Cancer Center, 7435 Fannin Street, Houston, TX 77054, USA 2Departments of Physics and Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA 3Howard Hughes Medical Institute, Department of Biomedical Engineering and Center for BioDynamics, Boston University, Boston, MA 02215, USA 4Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02215, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.01.030

Cellular decision making is the process whereby cells assume different, functionally important and heritable fates without an associated genetic or environmental difference. Such stochastic cell fate decisions generate nongenetic cellular diversity, which may be critical for metazoan development as well as optimized microbial resource utilization and survival in a ﬂuctuating, frequently stressful environment. Here, we review several examples of cellular decision making from viruses, bacteria, yeast, lower metazoans, and mammals, highlighting the role of regulatory network structure and molecular noise. We propose that cellular decision making is one of at least three key processes underlying development at various scales of biological organization.

Introduction repeatedly used for several decades as a pictorial illustration of If we, humans, want to control living cells, two strategies are typi- differentiation in multicellular development. Despite its sugges- cally available: modifying their genome or changing the environ- tive qualities and repeated use, it has been largely unclear ment in which they reside. Does this mean that cells with what the valleys and peaks represent in the illustration called identical genomes exposed to the same (possibly time-depen- ‘‘Waddington’s epigenetic landscape.’’ The increasingly quanti- dent) environment will necessarily have identical phenotypes? tative characterization of gene regulation at the single-cell level Not at all, for reasons that are still not entirely clear. When cells is now enabling the computation of Waddington’s landscape assume different, functionally important and heritable fates (Figure 1), which can serve as a general illustration of an emerging without an associated genetic or environmental difference, theoretical framework for cellular decision making. Assuming cellular decision making occurs. This includes asymmetric cell for a moment that cellular states can be represented by the divisions as well as spontaneous differentiation of isogenic cells concentration of a single molecule, the horizontal axes in Figure 1 exposed to the same environment. Speciﬁc environmental or will correspond to the concentration of this molecule and a time- genetic cues may bias the process, causing certain cellular fates dependent environmental factor, respectively, whereas the to be more frequently chosen (as when tossing identically biased vertical dimension corresponds to a potential that governs coins). Still, the outcome of cellular decision making for indi- cellular dynamics. Cells illustrated as spheres will tend to slide vidual cells is a priori unknown. down along the concentration axis (pointing from left to right) A growing number of cell types are being described as toward local minima (stable cell states) on this landscape while capable of decision making under various circumstances, sug- they also progress toward the observer in time, as a time-depen- gesting that such cellular choices are widespread in all organ- dent environmental factor continuously reshapes the geography isms. What are the molecular mechanisms underlying the of the landscape. Based solely on these considerations, identical decisions of various cell types, and why are such decisions so cells released from the same point on Waddington’s landscape common? We hope to suggest answers to these questions will follow indistinguishable trajectories, precluding cellular deci- here by considering examples at increasing levels of biological sion making and differentiation. On the other hand, cells released complexity, from viruses to mammals. Such a comparative over- from distinct but nearby points can move to different minima as view may reveal common themes across different domains of life the bifurcating valleys amplify pre-existing positional differences. and may offer clues about the signiﬁcance of cellular decision According to this deterministic interpretation, cellular decision making at increasing levels of biological complexity (Maynard making and differentiation are completely explained by pre-exist- Smith and Szathma´ ry, 1995). ing phenotypic differences within isogenic cell populations. Balls rolling down a slanted landscape with bifurcating Extensive theoretical and experimental work has started valleys (Waddington and Kacser, 1957) have been widely and to seriously challenge this simplistic deterministic view, as it is

910 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 1. Illustration of Cellular Decision Making on a Molecular Potential Landscape The landscape (projected onto the concentration of a speciﬁc molecule) is reshaped as the environment changes in time. The blue ball represents a cell that, under the inﬂuence of a changing environment, can assume three different fates at the proximal edge of the landscape (white balls at the end of the time course). Even in a constant environment, cells can transition between local minima due to random perturbations to the landscape (intrinsic molecular noise).

of escape rate theory (Ha¨ nggi et al., 1984; Mehta et al., 2008; Walczak et al., 2005), even cells maintained in a constant environment will have limited residence time around each local minimum (valley) on the landscape, as noise can induce repeated transitions between various cellular states. becoming clear that at least four critical revisions to Wadding- Why is cellular decision making so widespread, and when ton’s picture are needed to properly describe cellular dynamics. could it confer advantages compared to more deterministic First, in reality, the landscape is high-dimensional, defined by all cell fate scenarios? Considering that noise is unavoidable when- intracellular molecular concentrations and multiple relevant ever a few copies of a certain molecule react with others inside environmental factors, and is not a potential in the usual sense. small volumes (as is the case of DNA inside cells), nongenetic For this reason, cyclic flows (eddies) may exist that move cells diversity should be very common in the cellular world. Because around on closed trajectories in concentration space, even if noise reduction requires high intracellular concentrations or the local geography is completely even (Wang et al., 2008). costly negative feedback loops, it should be more surprising if Second, the landscape is under the constant influence of omni- a cellular process is not noisy than if it is. However, not all pheno- present molecular noise (Kaern et al., 2005; Maheshri and typic diversity is functionally important and heritable across cell O’Shea, 2007; Rao et al., 2002)—stochastic ‘‘seismic vibrations’’ divisions and may therefore not classify as cellular decision of varying amplitudes and spectra, specific to each location on making. Still, some cellular processes are noisier than expected the landscape (Figure 1). Third, the landscape is not rigid: cells based on Poissonian protein synthesis and degradation (New- themselves may reshape the geography due to cell-cell interac- man et al., 2006), or the resulting cellular states are heritable tions (Waters and Bassler, 2005) and the growth rate depen- across several cell cycles (see below), arguing for functionality dence of protein concentrations (Klumpp et al., 2009; Tan as the reason for their existence. et al., 2009). Last, but not least, growth rate differences between The need for stochastic differentiation appears when indi- various cellular states reshape the landscape, lowering locations vidual cells are unable to fully adapt to their environment. For of high fitness and elevating points with reduced fitness as fast example, photosynthesis and nitrogen fixation are essential but growth ‘‘overpopulates’’ certain locations and thereby deepens mutually exclusive functions in many cell types. To resolve this the landscape. Therefore, Waddington’s landscape must be dilemma, many cyanobacteria dedicate a subpopulation of cells integrated with a nongenetic version of Sewall Wright’s fitness entirely to nitrogen fixation while the rest of the cells remain landscape for genetically identical individuals (Pa´ l and Miklo´ s, photosynthetic (Wolk, 1996), thereby ensuring that the cell pop- 1999). ulation can simultaneously fix carbon and nitrogen. The segrega- For the purposes of this review, intrinsic gene expression tion of somatic cells from germ cells is another classic example in noise (Blake et al., 2003; Elowitz et al., 2002; Ozbudak et al., which the tasks of locomotion and replication are allocated to 2002) is the most critical component missing from Waddington’s different subpopulations (Kirk, 2005). Stochastic differentiation picture. The reason is that even identical cells released from the into a growth-arrested but stress-resistant state (such as a spore) same location in Figure 1 will feel the perturbing effects of may optimize survival in an uncertain, frequently stressful envi- omnipresent random fluctuations at every point on their way. ronment by segregating two essential tasks: growth in the Random noise will shake them apart, modifying their trajectories absence of stress and survival in the presence of stress. Theoret- and forcing them to cross barriers, diffuse along plateaus, and ical work has demonstrated the advantage of phenotypic find new local minima. Thus, intrinsic noise enables the pheno- specialization in a cell population when the added benefits typic diversification of completely identical cells exposed to from two vital tasks are smaller than the cost for one cell to the same environment and further facilitates cellular decision perform both tasks (Wahl, 2002). Theory has also shown that making for cells already slightly different when released onto a population of cells capable of random phenotypic switching Waddington’s landscape. Moreover, according to the concepts can have an advantage in a fluctuating environment (Kussell

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 911 and Leibler, 2005; Thattai and van Oudenaarden, 2004; Wolf and has served as a model for virology for more than half of et al., 2005). Recent experiments confirmed these predictions, a century. The infection cycle of this ‘‘coliphage’’ virus begins showing that noise can aid survival in severe stress (Blake with attachment to the bacterial cell wall, followed by the injec- et al., 2006), can optimize the efficiency of resource uptake tion of viral DNA into the host and the initiation of transcript during starvation (Çagatay et al., 2009), and can optimize survival synthesis. From this moment, two outcomes are possible. The in specific fluctuating environments (Acar et al., 2008). infection either culminates with replication and virus assembly Still, the optimality of stochastic cellular decision making in a that causes host lysis, or it concludes with the integration of viral well-defined environment does not guarantee that this behavior DNA into the bacterial chromosome followed by a prolonged can evolve. This is because, usually, one of the stochastically period of lysogeny. This is a typical example of decision making chosen cellular states has lower direct fitness (West et al., at the subcellular level, as viruses with identical genomes infect- 2007), rendering the switching strategy vulnerable to invasion ing isogenic cells can either become lytic or lysogenic. Despite by mutants that never switch into the less fit state but neverthe- the apparent simplicity of the viral genome, the story of lambda less reap the benefits of cohabitation with faithful switchers. This phage decision making is still not completely written and may can be prevented by cheater control (West et al., 2006) or by the hold many surprises. regular recurrence of detrimental environmental conditions that In a series of papers starting in the 1960s (Ptashne, 1967), suppress or eliminate such mutants. Once stochastic switching Mark Ptashne described two repressors (CI and Cro) that proved became an evolutionarily stable strategy, such task-sharing to be essential for the lysis-lysogeny decision of phage lambda decisions in clonal microbial populations (Bonner, 2003; Veening (Figure 2A). CI and Cro are encoded from two divergent et al., 2008a) may have formed the bases of multicellular devel- promoters (PRM and PR, respectively) and are controlled by three opment. Therefore, the need for optimal resource utilization and shared operator sites (OR1, OR2, and OR3) to which either CI or survival in a changing environment may have been important Cro dimers can bind but with different affinities (OR1 > OR2 > driving forces behind the evolution and maintenance of cellular OR3 for CI and OR3 > OR2 > OR1 for Cro). CI repressor binding decision making across various domains of life, as suggested to OR1 has negligible effect on cI transcription, whereas CI by the recent laboratory evolution of bet hedging (Beaumont binding to OR2 activates and CI binding to OR3 represses cI tran- et al., 2009). scription. CI binding to any operator site represses cro transcrip- In the following, we will describe how approaches from molec- tion (Figure 2A). Cro represses both its own and cI expression but ular biology, nonlinear dynamics, and synthetic biology have has a stronger effect on cI. Consequently, CI and Cro mutually been used to gain insight into the role of biological noise in repress each other, operating as a natural toggle switch (Gardner cellular decision making, effectuated by a variety of molecular et al., 2000) with bistable dynamics (Figures 2B and 2C), network structures in organisms of increasing biological augmented with autoregulatory loops. This regulatory structure complexity, including viruses, bacteria, yeast, lower metazoans, inspired the first mathematical models of the lambda switch and mammals. (Shea and Ackers, 1985). Though the CI-Cro module is most commonly known as the Viruses core of the ‘‘lambda switch,’’ it is only the tip of the iceberg of One of the earliest molecular choices made during the evolution regulatory interactions involved in the lysis/lysogeny decision of life on Earth may have been the environment-dependent deci- (Oppenheim et al., 2005; Ptashne, 2004). Additional mechanisms sion to arrest replication. As the first replicators appeared in the include DNA loop formation that reinforces cro repression, regu- primordial soup (Dawkins, 2006), it may have been advanta- lation of cI expression by CII and CIII, and antitermination of geous to copy themselves rapidly only in favorable conditions, cro transcript synthesis (Figure 2A). These components were including an appropriate level of basic building blocks, tempera- included into a comprehensive stochastic model of the lambda ture, acidity, radiation, and preferably no fellow competitors. switch (Arkin et al., 1998), which was also the first study to apply Moreover, alliances between replicators and sensor mole- the Gillespie algorithm (Gillespie, 1977) for modeling a natural cules may have formed to ensure that replication occurred effi- gene network. This seminal work pointed out how stochastic ciently and accurately under the appropriate circumstances. molecular events, originating from the random movement of Though we may never be certain about the specific events that cellular contents, can trigger decisions on a much larger scale, took place as life began on our planet, viral infections probably leading to divergent cellular fates. offer some clues (Koonin et al., 2006). Viruses are among the Stochastic decision making starts as soon as the first viral simplest nucleic acid-based replicating entities, which presently gene products appear in the cytoplasm. Cro gets a head start, can only multiply inside of the cells they parasitize. Nevertheless, but CI catches up soon, and both fluctuate due to random tran- viral decisions taking place in host cells are in every aspect scription-regulatory events. The race continues until the abun- similar to the bacterial, fungal, and metazoan cellular fate dance of one of these molecules overwhelms the other, choices described in the subsequent sections, indicating that terminally flipping the lambda switch into one of two possible cellular decision making is a misnomer. In fact, ‘‘cellular’’ deci- stable states (cro-on/cI-off or cI-on/cro-off). In addition to sions are taken by more or less autonomous replicating systems early stochastic events, many environmental factors can bias that reside inside and manipulate the behavior of carrier cells to stochastic decision making and influence the outcome of infec- maximize the chance of their own propagation (Dawkins, 2006). tion, including the nutritional state and DNA damage response of A particularly well-studied virus is bacteriophage lambda the host cell, as well as the number of phages coinfecting the (Ptashne, 2004), which preys on the bacterium Escherichia coli host cell (multiplicity of infection). Therefore, the lambda phage

912 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 2. Viral Decision Making (A) Gene regulatory network controlling the lambda phage lysis/lysogeny decision consists of the core repressor pair CI and Cro and a number of additional regulators, such as N and CII. Cro and CI mutually repress each other, and CI also activates itself from the OR2 operator site, which results in a structure of nested positive and negative feedback loops. The mutual regulatory effects of CI and Cro are annotated with the number of the OR site corresponding to each particular interaction. (B) Nullclines for CI and Cro, based on the model from Weitz and colleagues (Weitz et al., 2008), at a multiplicity of infection MOI = 2. Along the CI nullcline, there is no change in CI, and along the Cro nullcline, there is no change in Cro. Neither CI nor Cro changes in the points where the nullclines intersect, which represent steady states. The nullclines intersect in three distinct points, indicating that there are three steadyR states. (C) Potential calculated along the Cro nullcline, based on the Fokker-Planck approximation, 4 = 2 ½ðf gÞ=ðf + gÞ d½CI, wherein f and g represent CI synthesis and degradation along the Cro nullcline, respectively. Filled circles indicate stable nodes. The gray circle indicates that the middle state is a saddle (unstable along the Cro nullcline but stable along the CI nullcline). Molecular noise will force the system to transition between the two valleys, especially in the beginning of infection when transcripts and proteins are rare and noise is high. (D) The autoregulation of the Tat transcription factor from HIV was reconstituted by expressing both GFP and Tat from the LTR promoter, which is naturally activated by Tat. The internal ribosomal entry site (IRES) (Pelletier and Sonenberg, 1988) between the two coding regions ensures that GFP and Tat are co- translated from the same mRNA template. (E) After being sorted based on their expression level as Off, Dim, Mid, and Bright, the cells followed different relaxation patterns: Off remained Off; Dim ﬁrst trifurcated into Off, Dim, and Bright, and then the Dim peak gradually disappeared; Mid relaxed to Bright; and most of Bright remained Bright, with a small subpopulation relaxing to Low. (F) Control synthetic gene circuit without feedback. (G) After sorting, the control gene circuit had a much simpler relaxation pattern. Most cells were Low, which remained Low after sorting, whereas Dim cells mostly remained Dim, with a few of them relaxing to Off. These patterns were interpreted as the hallmarks of excitable dynamics. has a stochastic switch that is capable of hedging bets in Weitz et al. (Weitz et al., 2008), Zeng and colleagues explained a ‘‘smart,’’ environment-dependent manner, investing in both away even more stochasticity (Zeng et al., 2010), showing that immediate and future expansions. the predictability of infection outcome improves if each phage The importance of intrinsic noise in the lambda switch was is assumed to cast its own lysis/lysogeny vote, a unanimous recently questioned by a number of research groups. First, it vote being necessary for lysogeny. Importantly, stochasticity was shown that the host cell volume plays an important role in was reduced, but not eliminated, in this study, suggesting that, the decision, with larger cells being more likely to lyse (St-Pierre although further details of the phage-host system may be discov- and Endy, 2008). This pointed to the concentration of infecting ered that make the outcome of infection more predictable, phages (rather than their absolute number) as the critical factor intrinsic stochasticity stemming from the random nature of in the outcome of infection. Following theoretical predictions by gene expression will remain an important factor to consider.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 913 So, is noise a general factor in viral choices between lysis and viruses support the idea that ‘‘cellular’’ decisions actually occur dormancy? This seems to be the case, as suggested by recent at the level of intracellular molecular networks. The outcome of work on the latency of human immunodeficiency virus (HIV) in these stochastic decisions is an environment-dependent CD4+ T cells (Weinberger et al., 2005). After HIV integrates into balance between lysis and lysogeny within viral populations, the host genome, active HIV infections almost always culminate faithfully encoded by the viral genome and the host environment. in lysis. However, the site of HIV integration is highly variable and Therefore, cellular decision making constitutes a very simple has a strong effect on the resulting expression dynamics. In rare mechanism for pattern formation that does not require cell-cell occasions, the integrated virus becomes latent, creating an interactions or intercell communication and can therefore oper- incurable reservoir that is the main obstacle preventing the elim- ate from the lowest to the highest levels of biological complexity ination of the disease (Han et al., 2007). To determine the mech- (from viruses to multicellular eukaryotes), as discussed below. anism of HIV latency, Weinberger and colleagues focused on the positive autoregulatory loop of the Tat transcription factor as the Bacteria key component in HIV decision making (Weinberger et al., 2005). Among microbes used in the study of unicellular development, The authors built two synthetic gene constructs, the first of which Bacillus subtilis leads the pack. Besides its easy genetic manip- coexpressed the green fluorescent protein (GFP) with Tat from ulation, the main reason for the popularity of this soil bacterium the long terminal repeat (LTR) promoter (positive feedback, is the variety of developmental choices that it assumes during Figure 2D), whereas the second consisted of GFP alone tran- starvation (Lopez et al., 2009). As nutrients become limiting, scribed from the same promoter (no feedback, Figure 2F). After B. subtilis gears up to differentiate into spores—nongrowing integrating these constructs into the genome, the authors moni- capsules that are highly resistant to a variety of stresses and tored the dynamics of GFP expression over several weeks after starvation—but without a rush. In fact, these bacteria take every sorting CD4+ T cells by their fluorescence as either Off, Dim, Mid, possible opportunity to delay sporulation of the entire clonal cell or Bright. The relaxation of these sorted cell populations over population by exploring a number of alternative options, time (Figures 2E and 2G) was interpreted as a signature of excit- including extracellular matrix production, motility, cannibalism, able dynamics, when cells perturbed from the stable Off state nutrient release through cell lysis, cell growth arrest, and DNA undergo transient excursions into the Bright regime, from which uptake (competence). Cells uncommitted to sporulation start they return to the Off state. Remarkably, this behavior depended growing as soon as one of these alternative strategies enables on the site of HIV integration (because most clonal populations them to do so or as soon as nutrients become available. initiated from a Bright cell remained Bright, and all Off clones re- One particular B. subtilis cell fate decision that has attracted mained Off). Only a small subset of clones exhibited excitable much attention recently is the transition to competence, when dynamics, suggesting that excitability requires weak basal LTR cells take up extracellular DNA and use it as food or perhaps promoter activity. These findings were in agreement with to integrate it into the genome as a mechanism of increased a simple mathematical model that captured the experimentally evolvability under stress (Galhardo et al., 2007). During starva- observed behavior of these constructs and identified preintegra- tion, only a limited percent of the clonal bacterial population tion transcription as the stochastic perturbation that causes the becomes competent, a decision dictated by the master regulator spikes in Tat expression. Further work showed a lack of cooper- ComK that activates the genes involved in this developmental ativity in the response of the LTR promoter to Tat and a rightward program, including itself (Figure 3A). ComK levels are controlled shift in the autocorrelation function of GFP expression due to by the protease complex MecA/ClpC/ClpP, which also binds positive feedback (Weinberger et al., 2008), confirming the ComS, a factor that is capable of preventing ComK degradation earlier conclusions that Tat autoregulation does not induce bist- through competitive binding to the protease. Because comS is ability (Weinberger et al., 2005). Instead, futile cycles of acetyla- repressed during competence, these interactions form a nega- tion/deacetylation of Tat en route to the LTR promoter act as tive feedback loop around comK. a dissipative ‘‘resistor,’’ weakening autoregulation and reducing Su¨ el and colleagues developed a mathematical model, Tat expression to basal levels. The fact that excitable HIV inte- showing that the nested positive and negative feedback loops gration clones readily respond to a number of immune enable excitable dynamics (Figures 3B and 3C), generating response-related external factors suggests that these excep- pulses of ComK protein expression and episodes of competence tional integrants may provide the pool of latent HIV infection in (Su¨ el et al., 2006). Each of these episodes starts with a transient resting memory T cells. When highly active antiretroviral therapy increase in ComK levels that is amplified through autoregulation, eliminates the productive HIV pool, these latent but excitable leading to a quick rise to maximal ComK protein expression and viruses wait for their chance to reappear as a new infection. transition to competence. This, in turn, leads to comS repression, In conclusion, these studies on lambda phage and HIV enabling the protease complex to degrade ComK, terminating suggest that viral choices between replication and latency the ComK pulse and the episode of competence. may, in general, be stochastic, driven by random molecular noise If ComK controls entry into and ComS controls exit from within networks characterized by bistable or excitable dynamics. competence, then they should affect different aspects of these This hints at the possibility that some of the most studied cellular transient differentiation events. This was indeed the case, as processes such as DNA replication may be based on stochastic found by controlled ComK and ComS protein overexpression decision making inherited from ancient biomolecular circuits, (Su¨ el et al., 2007). High basal comK expression increased the e.g., that autonomously dictate the length of the G1 phase before frequency of competence epochs until the point in which the cell cycle Start (Di Talia et al., 2007). Moreover, these studies on cells remained permanently competent. On the other hand,

914 Cell 144, March 18, 2011 ª2011 Elsevier Inc. while lowering the noise in ComK protein expression. Examining the rate of competence initiation in cells of increasing length (and consequently, decreasing ComK noise), the rate of competence initiation dropped substantially. Consequently, ComK noise plays a crucial role in competence initiation by elevating subthreshold levels of ComK toward a critical point at which positive feedback takes effect to initiate periods of competence, in a manner similar to stochastic resonance (Wiesenfeld and Moss, 1995). Likewise, ComS protein expression noise was found crucial for controlling not just the length, but also the variability of competence episodes. A synthetic gene circuit with equivalent average dynamics to the natural one had much lower variability of competence episodes, which severely compromised the DNA uptake capacity of the cell population (Çagatay et al., 2009). The crucial role of noise in competence initiation was independently confirmed by another group (Maamar et al., 2007) after successfully decoupling ComK protein expression noise and mean. Although they studied ComK dynamics over a shorter time, Maamar et al. found that entry into competence occurred predominantly during a transient rise in ComK expression around the time of entry into stationary phase. Competence is a bacterial attempt to delay complete sporulation of the entire clonal cell population. However, if no cells decide to sporulate while the environment continues to worsen, the population will have a decreased chance of survival. There- fore, all bacteria must eventually sporulate, which they do, but only gradually over several days. A recent account of cell fate decision in sporulation conditions reported on cell population size and individual cell length in growing B. subtilis microcolonies (Veening et al., 2008b). After the initial exponential phase, the authors observed a period of slow bacterial growth (diauxic shift), later followed by complete growth arrest for approximately half of a day. By measuring the growth rate and morphology of individual cells, three distinct cell fates were identified: spores, vegetative cells, and lysing cells. Interestingly, only the vegeta- Figure 3. Competence Initiation in B. subtilis (A) The gene regulatory network controlling entry into competence consists of tive cells grew during the diauxic phase, accounting alone for the master regulator ComK and its indirect activator, ComS. ComK activates all of the growth observed during this period. Cells that later its own expression, and ComS is downregulated during competence, which formed spores or lysed did not grow, indicating that their cellular results in a structure of nested positive and negative feedback loops. Regu- fates bifurcated much before their terminal phenotypes could be latory interactions mediating positive and negative feedback are shown in red and blue, respectively. Arrowheads indicate activation; blunt arrows indicate determined. This phenotypic bifurcation was independent of cell repression. age but was consistent within ‘‘cell families’’ defined as a cell and (B) Nullclines for ComK and ComS, based on the model from Su¨ el et al. (2006). all its descendants, implying ‘‘transgenerational epigenetic The nullclines intersect in three distinct points, indicating that there are three steady states. inheritance’’ (Jablonka and Raz, 2009) of this decision. These (C) Potential calculated along the nullclineR d[ComS]/dt = 0, based on the heritable cell fate decisions correlated with transcription from Fokker-Planck approximation, f = 2 ½ðf gÞ=ðf + gÞ d½ComK, wherein f and a sporulation promoter and were eliminated when the phosphor- g represent comK synthesis and degradation, respectively, along the ComS nullcline. The filled circle on the left indicates a stable steady state. The gray elay feedback through the master sensor kinase for sporulation circles in the middle and on the right indicate saddle points: the middle one is was disrupted, demonstrating the importance of posttransla- unstable along the ComS nullcline (it is sitting on a ‘‘crest‘‘ in the potential), tional (rather than transcriptional) positive feedback in the inher- whereas the one on the right is unstable along the ComK nullcline. A small itance of cellular fate. perturbation (due to molecular noise) will drive ComK expression from the stable steady state near the other two steady states, initiating transient Observing the frequency of stochastic cellular decisions in differentiation into competence, after which the system returns to the steady clonal bacterial populations brings up the interesting question: state on the left. is there a role for cellular decision making as bacteria join forces in a population-level effort such as in quorum sensing, the ability high comS basal expression prolonged the time spent in compe- of bacteria to detect their density and thereby orchestrate pop- tence. To further establish the mechanism of competence initia- ulation-level behaviors such as luminescence or virulence? tion, the authors ingeniously inhibited cell division while DNA This question is currently being addressed using Vibrio harveyi replication continued unaltered. This caused the cell volume to as a model organism. As V. harveyi cells divide and their density increase, leaving the average ComK concentrations unaffected exceeds a threshold, they undergo a remarkable transition and

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 915 become bioluminescent, which is made possible by their ability Yeast to synthesize and detect specific small signaling molecules The budding yeast Saccharomyces cerevisiae was the first (autoinducers) through quorum sensing (Waters and Bassler, organism for which the noise of thousands of fluorescently 2005). Growing cell populations produce more and more autoin- tagged proteins expressed from their native promoters was ducer, which becomes concentrated and turns on biolumines- measured (Bar-Even et al., 2006; Newman et al., 2006). Many cence, in addition to a number of other functions related to multi- yeast genes were found to be significantly noisier than expected cellular behavior. Whether all or only some individual cells based on Poissonian protein synthesis and degradation, sug- undergo the decision triggered by quorum sensing remains an gesting that gene expression bursts may cause the elevated important open question that will soon be answered thanks to noise of certain genes, which may be beneficial and under selec- recent efforts to measure quorum sensing-related gene expres- tion. These noisy genes had a tendency to be associated with sion at the single-cell level in newly engineered V. harveyi strains. stress responses (Gasch et al., 2000) and often contained So far, gene expression measurements for the master quorum- a TATA box in their core promoter. Accordingly, TATA box sensing regulator LuxR (Teng et al., 2010) and a small RNA mutations were found to diminish gene expression noise, which controlling LuxR expression revealed relatively low but autoin- lowered the chance of survival in severe stress from which the ducer-dependent noise (Long et al., 2009), which may imply gene’s protein product offered protection (Blake et al., 2006). that the V. harveyi quorum-sensing circuit has evolved to reduce Taken together, these results suggested that yeast cells carry noise and bacterial individuality while transitioning to population- an arsenal of genes with unexpectedly noisy expression, level behavior. Indeed, multiple nested negative feedback loops supplying the noise needed for phenotypic diversification, have been identified along the signaling cascade connecting which can benefit the population in a fluctuating, often stressful autoinducer receptors to LuxR (Tu et al., 2010), which are environment. network structures capable of noise reduction (Becskei and The galactose uptake system is a relatively well-studied Serrano, 2000; Nevozhay et al., 2009). example of a noisy environmental response network. Yeast cells Other examples of cellular decision making in bacteria are the show bimodal expression of galactose uptake (GAL) genes when activation of the lactose operon in E. coli and bacterial persis- exposed to a mixture of low glucose and high galactose, indi- tence (phenotypic switching of bacteria to an antibiotic-tolerant cating that cells decide stochastically between utilizing either state). The first of these has a history of more than five decades the limited amount of glucose or growing on galactose (Biggar (Novick and Weiner, 1957) and will not be discussed here. and Crabtree, 2001). On the other hand, the expression of GAL Regarding bacterial persistence, some critical information is still genes is, in general, more uniform in the absence of glucose missing. Persistence of E. coli cells has been observed at the when grown on galactose alone or on galactose mixed with raffi- single-cell level (Balaban et al., 2004), but the underlying network nose and glycerol. How is it possible for a gene network to and molecular mechanisms may be highly complex and are generate uniform or bimodal (noisy) expression across the cell currently unknown. Conversely, a bistable stress response net- population, depending on the stimulus? Essentially, the GAL work has been proposed to underlie persistence in Mycobacteria molecular circuitry consists of three feedback loops. Two of (Sureka et al., 2008; Tiwari et al., 2010), but the measurements these feedback loops are positive and involve the galactose to observe persistent cells and link them to this network have permease Gal2p and the signaling protein Gal3p. The third feed- yet to be performed. back loop is negative and involves the inhibitor Gal80p. All three In summary, bacteria are masters of cellular decision making, molecules (Gal2p, Gal3p, and Gal80p) are under the control of which enables them to hedge bets in a fluctuating, often stressful the activator Gal4p, and they also regulate Gal4p activity and environment. This may explain their presence in the most galactose uptake (Figure 4). To understand how this network extreme and unpredictable environments. Unlike viruses, which structure affects cellular decision making, each of these feed- typically decide between lysis and lysogeny, genetically identical back loops was individually disrupted (Acar et al., 2005), and bacteria can select their fates randomly from a spectrum of the pattern of GAL gene expression across the cell population multiple options. Fates with lowest direct fitness (such as the was examined after transferring the cells from no galactose- or spore state) are entered gradually, with a delay, while a variety high galactose-containing medium to various intermediate of alternative options are explored. Bacterial cell decisions galactose concentrations. involve noisy networks with feedback loops that are capable The wild-type strain, with all three feedback loops intact, of bistable or excitable dynamics. Unlike viruses, bacteria had history-dependent gene expression a day after transfer, can combine cellular decision making with other mechanisms depending on the original growth condition. Specifically, wild- (such as cell-cell communication) to achieve more complex type cells transferred from high galactose had unimodal GAL population-level behaviors. Cellular decision making appears expression tracking the galactose concentration, whereas those suppressed when cell-cell communication becomes prominent transferred from low galactose had bimodal expression, indi- (as in quorum sensing), suggesting that microbial individuality is cating that only a subpopulation of cells made the choice to undesired when genetically identical bacteria assume multicel- take up galactose. GAL2 deletion had a minimal effect on the lular behaviors. The above examples indicate that many bacterial GAL expression pattern compared to wild-type cells. On the species are capable of population-level behaviors. Moreover, other hand, disruption of the Gal3p-based positive feedback these examples suggest that the simplest forms of multicellular loop resulted in unimodal GAL gene expression regardless behavior do not require physical contact or communication of the conditions prior to transfer, indicating that the cells lost between cells. their capacity of decision making. Finally, disruption of the

916 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Gal80p-based negative feedback loop resulted in unimodal, low GAL expression for cells transferred from no galactose, whereas cells transferred from high galactose had a bimodal distribution. Overall, these results indicate that the Gal3p- and Gal80p-based feedback loops play critical roles in cellular decision making and history dependence of GAL expression. The gene expression patterns observed by Acar and colleagues (Acar et al., 2005) bring up an important concept: cellular memory. Considering that cells make stochastic decisions, how long do they stick to their choices? This question can be reformulated in terms of escape rates and addressed theoretically, as follows: given that a cell resides in a potential well on Waddington’s landscape (Figure 1), how long does it take for it to escape under the influence of noise to a nearby well? Theory predicts that the chance of escape depends on two factors: noise strength and the height of the barrier that needs to be surpassed in order to escape (Ha¨ nggi et al., 1984) (noise facilitates, whereas a tall barrier hinders escape). Based on the noise strength and the ‘‘geography’’ of the potential shown in Figure 4, the authors predicted that, by controlling GAL80 expression, they could prolong or shorten the maintenance of high and low GAL expression states in cells with disrupted negative feedback. This was then confirmed experimentally (Acar et al., 2005). Another remarkable case of yeast cell decision making was described by Paliwal and colleagues, who used clever microfluidic chip design to study the response of individual a mating-type yeast cells to the a pheromone (Paliwal et al., 2007). Pheromone was supplied artificially so as to establish a spatial gradient in which a high number of cells exposed to various pheromone concentrations could be observed. Normally, the pheromone serves as a cue to direct a cell elongation (shmooing) toward a mating partner of opposite type (a). Cells exposed to no pheromone or high pheromone behaved in a uniform fashion (all cells budding and shmooing, respectively). However, a very different scenario emerged for cells that were exposed to identical intermediate pheromone concentrations: a mixture of budding, cell cycle arrested, and shmooing phenotypes were observed, demonstrating cellular decision making. Shmooing cells had significantly higher expression of the transcription factor Fus1p, indicating that at least one observed phenotype was attributable to bimodal gene expression. The network that is responsible for Fus1p activation consists of a mitogen-activated protein kinase (MAPK) pathway that encompasses multiple positive feedback loops, prime candidates for inducing bimodal FUS1 expression. Indeed, disruption of these feedback loops Figure 4. The Galactose Uptake Network in S. cerevisiae made FUS1 expression and the response to pheromone more (A) Regulatory network controlling galactose uptake. Regulatory interactions uniform across yeast cell colonies, supporting the idea that posi- mediating positive and negative feedback are shown in red and blue, tive autoregulation can induce cellular decision making. respectively, and the regulatory interaction that participates in both positive and negative feedback loops is shown in light blue. Solid lines indicate tran- These examples indicate that cellular decision making is scriptional regulation; dashed lines indicate nontranscriptional regulation widely utilized by yeast cells to maximize the propagation of their (for example, Gal80p binds to Gal4p and represses Gal4p activator function on genome in a changing environment. A prominent role of feed- GAL promoters). Arrowheads indicate activation; blunt arrows indicate back regulation in cellular decision making is emerging from repression. (B) Gal3p synthesis (blue lines) and degradation (red line) rates as functions of Gal3p concentration, for three different galactose concentrations. R (C) Potential based on the Fokker-Planck approximation, f = 2 ½ðf gÞ= additional steady state appears (deep well on the right). As galactose ðf + gÞd½Gal3p, wherein f and g represent Gal3p synthesis and degradation, concentration is slowly increased, cells can end up in either potential well respectively. There is a stable steady state on the left side of the surface at all (cellular decision making). Moreover, molecular noise can move cells from one galactose concentrations. At sufficiently high galactose concentrations, an potential well to the other, even in constant galactose concentration.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 917 these examples, although other regulatory mechanisms (such as epigenetic regulation) can also play a role (Octavio et al., 2009). As many genes are noisy when yeast cells grow in suspension, it is interesting to ask how noise and cellular decision making are regulated and exploited during the transition to population- level behaviors such as ﬂocculation due to quorum sensing in yeast cell populations (Smukalla et al., 2008). Yeast cells carry a primitive version of the molecular arsenal utilized during metazoan development, such as homeodomain proteins, morphogens, and the apoptosis pathway. Is noise in these pathways suppressed or elevated in yeast compared to higher eukaryotes? Answering these questions may yield important insights into the regulation of cellular decision making in metazoan development.

Lower Metazoans Animals are compact multicellular organisms that grow out from a single zygote cell following a complex embryonic developmental program. During development, increasingly differentiated cell types emerge through sequential rounds of cell division, giving rise from about one thousand (Caenorhabditis elegans) to millions (Drosophila melanogaster) or tens of trillions (humans) Figure 5. Cell Fate Specification during Lower Metazoan of isogenic cells in a fully developed animal. Moreover, these Development hunchback expanding and diversifying cell subpopulations perform remark- (A) The morphogen Bicoid regulates expression during fruit fly development, setting up the scene for subsequent patterning of the embryo. ably well-defined movements in space and time, such that they (B) Bicoid and Hunchback concentrations along the anterior-posterior axis of arrive to appropriate locations relative to each other, ready to the fruit fly embryo (length: 500 mm), according to the measurements by perform their function in the adult animal (Goldstein and Nagy, Gregor and colleagues (Gregor et al., 2007). The Bicoid concentration (red) is exponentially decreasing toward the posterior end, with a length constant 2008). Importantly, a few cells embed themselves into specific of 500 mm, and is ‘‘read out’’ by Hunchback (blue) with a 10% relative error niches and remain partially undifferentiated, thereby becoming rate according to the average dose-response relationship Hb/Hbmax = 5 5 adult stem cells that are capable of replacing differentiated cells (Bcd/Bcd1/2) /[1+(Bcd/Bcd1/2) ]. that are lost during adult life. (C) Gene regulatory network controlling intestinal cell fate specification during Caenorhabditis elegans development. The tremendous population expansion that cells undergo during embryonic development poses a serious danger of error amplification, implying that stochastic cellular decision making should be less common than in unicellular organisms, and drop between neighboring sections, regardless of their location control mechanisms should exist to suppress it during develop- in the embryo (Gregor et al., 2007). This concentration change ment (Arias and Hayward, 2006). Without proper control, a is successfully and reliably detected by neighboring nuclei, as random switch to an incorrect cell fate in the wrong place or at indicated by their gene expression pattern (Holloway et al., the wrong time could have detrimental consequences for the 2006). How is it possible to achieve this precision? developing embryo. For this reason, highly stochastic cell fate Among other genes, hunchback expression represents a crit- choices may be restricted to specific cell types and develop- ical readout of Bicoid concentration (Figure 5A), restricting future mental stages, such as the differentiation of adult and embryonic segments in the larva, and later the adult fly, to their appropriate stem cells or the differentiation of cells whose precise location is locations. Hunchback expression levels showed sigmoidal unimportant (such as retinal patterning and hematopoiesis). morphogen dependence, indicating highly cooperative activa- Given the omnipresence of noise, how precise can animal tion by Bicoid (Figure 5B). More importantly, Hunchback had development be, and what noise control mechanisms are remarkably low noise levels in sets of nuclei exposed to identical utilized? These questions were addressed recently by moni- morphogen concentrations, with a noise peak corresponding to toring the spatial expression pattern of the gap gene hunchback the steepest region of the Hunchback dose response, in which in single D. melanogaster nuclei in response to the morphogen the coefficient of variation was about 20%. Assuming that Bicoid (Figures 5A and 5B), which is asymmetrically deposited hunchback expression noise was originating from Bicoid fluctu- by the mother to the anterior pole of the egg (Gregor et al., ations, the authors used the Bicoid-Hunchback dose-response 2007). The fertilized fruit fly zygote initially does not separate data to infer the noise in Bicoid concentration, as perceived by into individual cells, allowing Bicoid to freely diffuse away from individual nuclei, and found a U-shaped error profile along the this pole and create an exponential anterior-posterior gradient anterior-posterior axis, with a minimum coefficient of variation along the dividing nuclei. Consequently, single nucleus-wide of 10%, consistent with earlier work (Holloway et al., 2006). sections perpendicular to the anterior-posterior axis in the devel- This indicates that cellular decision making is strongly sup- oping embryo will have practically identical, exponentially pressed while setting up hunchback expression along the decreasing morphogen concentrations (Figure 5B), with a 10% embryo in response to Bicoid. Individual nuclei have merely

918 Cell 144, March 18, 2011 ª2011 Elsevier Inc. 10% autonomy in deciding what Bicoid concentration is in their yellow. The pale/yellow choice occurs in the photoreceptor R7 surroundings and setting up the appropriate response. of each ommatidium: if R7 expresses Rh3, then the ommatidium Seeking to understand how neighboring nuclei could reliably becomes pale, whereas if it expresses Rh4, the ommatidium detect a 10% drop in Bicoid concentration, Gregor and becomes yellow. R7’s choice is then transferred to R8 and stabi- coworkers estimated the averaging time necessary to reduce lized through a positive feedback loop between the regulators the error that individual nuclei make in estimating Bicoid concen- warts and melted. Pale and yellow ommatidia are randomly trations, relying solely on stochastic Bicoid binding/dissociation localized and make up 30% and 70% of the fly eye, respectively, events to/from its DNA-binding sites. The results were strikingly suggesting that their positioning results from stochastic cell fate inconsistent with the temporal averaging hypothesis, requiring choices. This random patterning can be abolished by the dele- nearly 2 hr of averaging to reach 10% relative error. Looking tion or overexpression of the transcription factor spineless, for alternatives, the authors asked whether spatial averaging which changes the retinal mosaic into uniformly pale and yellow, could also contribute to noise reduction. Measuring the spatial respectively (Wernet et al., 2006). autocorrelation of Hunchback concentration fluctuations around Fruit fly development suggests that gene expression noise and the mean revealed that nuclear communication indeed occurs stochastic cell fate choices are carefully controlled and often over approximately five nuclear distances, reducing the aver- suppressed, except when they are not disruptive for develop- aging time to a single nuclear cycle (3 min). In summary, sets mental patterning (Boettiger and Levine, 2009) or when they of neighboring nuclei talk to each other and jointly accomplish are exploited to assign random cell fates with desired probabili- quick and accurate estimates of the local Bicoid gradient. The ties (Wernet et al., 2006). What happens if noise suppression fails identity of the mediator for this nuclear communication remains and fluctuations escape from control? This was examined by elusive. monitoring mRNA expression in single cells during C. elegans To study spatiotemporal patterns of expression for several development (Raj et al., 2010) in a regulatory cascade composed genes during a later developmental stage (mesodermal pattern- of multiple feed-forward loops controlling the expression of elt-2, ing), another group applied quantitative in situ hybridization a self-activating transcription factor that is critical for intestinal followed by automated image processing in hundreds of fruit cell fate specification (Figure 5C). After the 65-cell stage, elt-2 fly embryos (Boettiger and Levine, 2009). Contrary to the high expression was high in all cells of all wild-type worm embryos. precision of Hunchback response to Bicoid (Gregor et al., However, this uniform expression pattern became variable 2007), several genes had variable, ‘‘dotted’’ expression across from embryo to embryo and bimodal within individual embryos the developing premesodermal surface, indicating that gene after mutation of the transcription factor skn-1, which sits at expression can be noisy even during multicellular development. the top of the regulatory hierarchy in Figure 5C, and caused This noise was, however, transient, as by the end of the meso- lack of intestinal cells in some, but not all, embryos. Similar dermal patterning phase, all cells expressed these genes at phenomena, when genetically identical individuals carrying the maximal level, indicating that cells can choose autonomously same mutation show either disrupted or wild-type phenotype, the time of their activation during mesodermal patterning but are called partial penetrance. have no freedom to choose their final expression level at the Counting individual mRNAs in all cells of hundreds of embryos, end of this period. Importantly, another subset of genes behaved Raj et al. observed sequential activation of the genes in Figure 5C differently from their noisy peers and reached their full expres- during development from the top toward the bottom of the hier- sion in concert, over a relatively short timescale. Seeking to archy, with med-1/2 exhibiting an early spike of expression, identify mechanisms underlying this type of ‘‘synchrony’’ for accompanied by a wider end-3 spike and a prolonged but still this second subclass of genes, the authors found that their transient high expression period of end-1. The outcome of these expression was typically regulated through a stalled polymerase. gene expression events was high and stable elt-2 expression Moreover, one of the low-noise genes, dorsal, had to be present and proper intestinal cell fate specification. By contrast, in the in two copies for maintaining the synchrony and low noise of skn-1 mutant, the expression of all genes was diminished or other genes from the second subclass. The few genes that absent, and the majority of embryos had practically no elt-2 still maintained low noise after deleting one dorsal copy were expression. Moreover, end-1 expression was highly variable found to have shadow enhancers—distal sequences involved within individual embryos, indicating that skn-1 mutations relieve in gene activation, which apparently ensure the robustness pre-existing noise suppression, thereby allowing stochastic and reliability of expression for a few highly critical develop- cell fate decisions to occur. Downregulation of the histone mental genes. These findings indicate that noisy gene expres- deacetylase hda-1 partially rescued the skn-1 mutant pheno- sion and stochastic cell fate decisions would be the default type, indicating that chromatin remodeling was one source of even during metazoan development if intricate regulatory mech- end-1 noise unveiled in skn-1 mutant embryos. However, dele- anisms did not exist to suppress these variations, ensuring reli- tion of upstream transcription factors other than skn-1 (i.e., able patterning. med-1/2, end-3) did not cause a comparably detrimental reduc- One developmental process that fully exploits cellular decision tion of end-1 levels. Taken together, these data suggest that making is the patterning of the fly’s eye. Compound fly eyes these intermediate transcription factors act in a redundant consist of hundreds of ommatidia, each of which harbor eight fashion, buffering noise in the system and ensuring sufficiently photoreceptors, two of which (R7 and R8) are responsible for high end-1 expression, which can then switch the elt-2 positive color vision. Based on rhodopsin (Rh) expression in these photo- feedback loop to the high expression state, ensuring reliable receptors, the corresponding ommatidia can become pale or intestinal cell fate specification.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 919 Figure 6. Embryonic Stem Cell Decision Making in Mammals (A) The Nanog-Oct4 gene regulatory network primes ESC differentiation. Regulatory interactions mediating positive and negative feedback are shown in red and blue, respectively. Regula- tory interactions that participate in both positive and negative feedback loops are shown in light blue. Arrowheads indicate activation; blunt arrows indicate repression. (B) Nullclines for Nanog and Oct4, based on the model from Kalmar et al. (Kalmar et al., 2009). The nullclines intersect only once, corresponding to a single stable steady state. (C) Potential calculated along the nullcline d[Nanog]/dt = 0, based on the Fokker-Planck approximation. The ﬁlled circle on the right indicates the only stable steady state. The gray shaded area is inaccessible because it corresponds to nonphysical solutions. The system undergoes transient excursions to the left (low Nanog concentrations) under the inﬂuence of molecular noise. This will prime the ESCs for differentiation if appropriate signals are present.

These examples together indicate that the noise of certain questions about the efficiency and stability of this reversal. Is genes is suppressed and buffered by a variety of mechanisms differentiation into specific cell types solely the result of cellular (such as spatial and temporal averaging, stalled polymerases, decision making, or is it somewhat controllable? To what degree and redundant regulation) during the development of lower is differentiation reversible, and can the rate of induced pluripo- metazoans. Consequently, cellular decision making is generally tency be increased? And what is the role of noise in pluripotency? suppressed unless specifically required for developmental Nanog is a critical pluripotency marker whose expression is patterning (as for the ommatidia of the composite fly eye) or lost during ES cell differentiation, and it is maintained at a high unless it is harmless (does not interfere with the execution of level only in pluripotent cells. Following in the footsteps of the overall developmental program). Disruption of the noise Chambers et al., who showed stochastic Nanog expression control mechanisms unmasks noise and can have detrimental corresponding to attempts of ES cell differentiation (Chambers effects on the development of the organism. Noise control during et al., 2007), Kalmar et al. monitored single ES cells and embry- development may resemble the apparent suppression of cellular onal carcinoma (EC) cells to better understand Nanog dynamics individuality during quorum sensing, which triggers population- (Kalmar et al., 2009). Both cell lines had a surprisingly strong, wide behavior in microbes. These and similar open questions bimodal heterogeneity of Nanog expression that involved can be properly addressed in the context of social evolution transitions between the high and low expression states. Consis- theory (West et al., 2006). On the experimental side, much tent with Nanog’s function, cells with low expression responded remains to be discovered about the consequences of ‘‘letting better to differentiation signals. Analyzing the dynamics of the noise loose’’ during development. For example, once the factor gene regulatory network controlling Nanog expression (Fig- that is responsible for spatial averaging across fruit fly nuclei ure 6A), the authors suggested that the system was excitable (Gregor et al., 2007) is identified, it would be interesting to rather than bistable, giving rise to a small ES cell subpopulation examine how fly development tolerates the inhibition of this with low Nanog expression through occasional random internuclear communication. excursions from the high to the low expression state (Figures 6B and 6C). Though this expression pattern is opposite to Mammals ComK dynamics during competence initiation in B. subtilis,it Embryonic development is highly conserved among mammals: relies on a gene regulatory network of similar structure, involving after a few divisions of the fertilized egg, the resulting cells nested positive and negative feedback loops, namely: mutual quickly advance to the blastocyst stage, which manifests as Oct4 and Nanog activation, Oct4 and Nanog autoregulation, a spherical trophectoderm surrounding the inner cell mass. and Nanog repression by Oct4. However, because the network The inner cell mass consists of pluripotent embryonic stem underlying ES cell pluripotency is not completely known, it (ES) cells that are capable of differentiating into any cell type in cannot yet be excluded that the high- and low-Nanog subpopu- the future organism. Therefore, efficiently isolating and maintain- lations result from noise-induced transitions in a bistable system ing ES cells in laboratory conditions holds exceptional potential (Chickarmane et al., 2006; Glauche et al., 2010; Kalmar et al., for future medical applications. 2009). Indeed, the source of noise driving Nanog excursions However, to truly exploit the pluripotency of stem cells, it is into the low expression state remains elusive, especially consid- essential to understand and control the processes underlying ering that high molecular levels are often associated with low their differentiation into various tissues. Moreover, the recent noise. Gene expression bursts (Raj et al., 2006) may offer a solu- success of reverting differentiated cells into induced pluripotent tion, as highly expressed proteins can be noisy provided that stem (iPS) cells (Takahashi and Yamanaka, 2006) poses further they are expressed in bursts (Newman et al., 2006).

920 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Differentiation is accompanied by loss of Nanog expression, sorted populations relaxed to the original distribution in in addition to downregulation of Oct4 and Sox2, the other tran- 9 days. The variability was found to reflect predisposition for scription factors responsible for the maintenance of Nanog certain cell fates because cells with low Sca-1 expression had expression and pluripotency. Contrary to the early belief that relatively high expression of the erythroid differentiation factor differentiated cells cannot return to the pluripotent state, Takahasi Gata1 and lower expression of the myeloid differentiation factor and Yamanaka (Takahashi and Yamanaka, 2006) found that PU.1. Accordingly, upon stimulation with erythropoietin, low controlled upregulation of Oct4, Sox2, Klf4, or c-Myc can convert Sca-1-expressing cells differentiated much faster into erythro- fully differentiated cells into iPS cells. However, such iPS cells cytes than their peers with high Sca-1 expression. Moreover, were remarkably difficult to obtain and appeared as only a minus- the differences among the original pluripotent stem cells were cule percentage in large differentiated cell populations exposed not restricted to these two differentiation factors: microarray to identical genetic and environmental perturbations. Trying to analysis revealed additional genome-wide differences in gene understand the enigmatic source of iPS cells, two possible expression between three subpopulations sorted by their scenarios for their generation were proposed (Yamanaka, Sca-1 expression (Sca-1low, Sca-1mid, and Sca-1high). 2009): the elite model assumed pre-existing differences respon- In addition to cell differentiation, one of the most important sible for reversal to the iPS cell state, whereas the stochastic processes recently shown to rely on cellular decision making is model assumed that reversal occurred by random chance, even apoptosis (Spencer et al., 2009). These authors followed by without any pre-existing differences. The dichotomy of these microscopy the fate of sister cell lineages exposed to a ‘‘mortal’’ models is analogous to the contrasting views of deterministic agent: tumor necrosis factor-related apoptosis inducing ligand versus stochastic dynamics on Waddington’s landscape, as (TRAIL) in two clonal cell lines (HeLa and MCF10A). A striking well as the recent controversy on the predictability of the lambda heterogeneity in cell fate was observed. Some cells never switch (St-Pierre and Endy, 2008; Zeng et al., 2010). died, and those that died showed a highly variable time A recent study set out to test experimentally the validity of the between TRAIL exposure and commitment to programmed cell elite versus the stochastic model in iPS cell induction (Hanna death (indicated by caspase activation or mitochondrial outer- et al., 2009). Differentiated murine B cells were identically membrane permeabilization). Moreover, sister cells that died prepared to harbor inducible copies of Oct4, Sox2, Klf4, and soon after TRAIL exposure showed synchronous commitment c-Myc and to express Nanog-GFP once reversal to the iPS to apoptosis, whereas those that died later showed gradually state occurred. A large number of clonal populations established decreasing correlation between their times of death, indicating from such B cells were maintained in constant conditions that these suicidal decisions depended on factors inherited continuously for several months, and the appearance of iPS cells from the mother cell that gradually and stochastically diverged was monitored over time. The first iPS cells appeared after as daughter cells divided over time. Measuring the concentra- 2–3 weeks, followed by other iPS reversals as time progressed. tions of five apoptosis-related proteins in single cells, together Toward the end of the experiment, nearly every clonal population with a mathematical model of TRAIL-induced apoptosis allowed (93%) had a significant number of iPS cells, demonstrating that the authors to conclude that most stochastic variation in the obtaining the iPS state is just a matter of time and patience, as commitment to cell death was due to initiator procaspase some descendants of every B cell were capable of returning to activity that cleaves the apoptotic regulator BH3 interacting the pluripotent state (also confirmed by their ability to generate domain death agonist (BID) into the truncated form tBID. When teratomas and chimaeras). These findings strongly support the tBID hits a threshold, this sets off an irreversible avalanche of stochastic model of induced pluripotency. The authors also molecular interactions that culminate in apoptosis. studied the influence of overexpressing p53, p21, Lin28, or In summary, these examples from mammalian cells indicate Nanog (in combination with all of the iPS-inducing factors that cellular decision making underlies the most basic cellular Oct4, Sox2, Klf4, and c-Myc) on the speed of reversal to the processes in some of the most complex organisms, relying on iPS state. All of these additional perturbations were found to regulatory networks with dynamics similar to those found in lower increase the rate of reversals to the iPS state but for different metazoans and microbes. However, the exact structure of the reasons. Whereas p53, p21, and Lin28 increased the cell division regulatory mechanisms controlling mammalian cell decisions is rate and had an effect by raising the B cell population size while much less understood than for lower organisms and may involve leaving the reversal rate per individual B cell unaffected, Nanog cytoskeleton dynamics (Ambravaneswaran et al., 2010), subcel- overexpression had a significant effect even after adjusting for lular localization, posttranslational modification, microRNA- growth rate differences. based regulation, or other yet unknown mechanisms. Moreover, Considering these studies demonstrating the role of noise in the studies discussed above were conducted in cell lines, and not ES cell differentiation and the induction of pluripotency, it is actual mammals, and very little is known about mammalian cell intriguing to ask whether there is a role for cellular decision fate choices in vivo. To start overcoming this gap, it will be impor- making in adult mammals. One of the first studies to address tant to compare and analyze cellular decision making from this question focused on adult progenitor cells (a multipotent microbes and lower metazoans from an evolutionary perspec- hematopoietic stem cell line) (Chang et al., 2008), observing tive, hoping to learn lessons applicable to mammals. that the expression of the stem cell marker Sca-1 varied over three orders of magnitude across this cell population. Sorting Conclusions, Challenges, and Open Questions the cells into distinct subpopulations based on their expression Here, we reviewed several examples of cellular decision making revealed that the variability in Sca-1 levels was dynamic: all at multiple levels of biological organization. The generality of this

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 921 phenomenon suggests that we are dealing with a fundamental Whereas noise makes individual cells somewhat uncontrollable, biological property, which many organisms evolved to utilize the same may not be the case for large clonal cell populations, due to the benefits of task allocation in isogenic cell populations. which can develop reliable patterns from unreliable elements Cellular decision making combined with environmental sensing due to the sheer power of statistics. For example, repeatedly and cell-cell communication are three key processes underlying tossing 100 fair coins will very likely result in nearly equal numbers pattern formation and development from microbes to mammals. of heads and tails, even though the fate of the individual coins is Moreover, viral decision making suggests that some form of unpredictable. In the same way, the fly eye will reliably consist of random diversification may have been present even before cells 30% pale and 70% yellow ommatidia, even though the fate of existed. In fact, the phrase ‘‘cellular decision making’’ is an individual ommatidia prior to patterning is uncertain. Synthetic oxymoron because these decisions actually occur at the level gene networks capable of controlling gene expression noise of gene regulatory networks such as the ones highlighted in (Murphy et al., 2010), the rate of random phenotypic switching this Review. Cells only provide microscopic meeting places for (Acar et al., 2005), or the duration variability of transient differen- the real key players: genes connected into regulatory networks tiation episodes (Çagatay et al., 2009) may be useful in the future (Dawkins, 2006). for adjusting the rate and outcome of cellular differentiation. Several conclusions can be drawn from the examples dis- Finally, a major challenge is to understand how cellular deci- cussed above. First, cellular decision making is frequently based sion making evolves under well-defined conditions. As dis- on networks with multiple nested feedback loops, at least one of cussed above, stochastic cellular fate choices lead to cell popu- which is positive. The role of these feedback loops in various deci- lation diversity, the simplest possible developmental pattern sion-making circuits remains to be determined, but it appears within isogenic cell populations. Such population-level charac- that positive feedback makes cellular decisions stable, whereas teristics are, however, conferred by gene networks carried by negative feedback makes them more easily reversible. Studying every individual cell in these populations, and stochastic diversi- the dynamics of multiple feedback loops and their role in differen- fication may ultimately serve the propagation of their constituent tiation and development has much insight to offer (Brandman genes (Dawkins, 2006). Phenotypic diversity implies that some et al., 2005; Ray and Igoshin, 2010; Tiwari et al., 2010). Second, individual phenotypic variants will have low direct fitness and these networks appear to operate in parameter regimes enabling will be at a disadvantage without stress, whereas others will either bistable or excitable dynamics. Third, cellular decision perish when the environment becomes stressful. However, in making relies on intrinsic molecular noise, which induces transi- specific cases, this type of sacrifice can be justified by Hamil- tions between steady states in bistable systems and transient ton’s rule (Hamilton, 1964), considering that the relatedness excursions of gene expression in excitable systems. Fourth, as between clonal individual cells is maximal, and the survival of a consequence of the above, all cellular decisions are reversible any individual will propagate the same genome. This may allow from a theoretical point of view, although, in practice, this may for kin selection, as suggested by recent theoretical work not occur due to the irreversibility of secondary effects triggered (Gardner et al., 2007). On the experimental side, laboratory by cellular decision making (such as cell lysis or apoptosis). evolution of microbes in fluctuating environments may offer The importance of intrinsic noise in cellular decision making exciting opportunities to address these questions (Cooper and has been questioned in a number of recent papers, which found Lenski, 2010), as exemplified by the recent experimental evolu- that pre-existing differences in cell size, virus copy number, tion of random phenotypic switching (Beaumont et al., 2009). microenvironments, etc., may explain to a significant degree More generally, it will be interesting to examine from the cell fate decisions (St-Pierre and Endy, 2008; Weitz et al., perspective of social evolution (West et al., 2006) the formation 2008). However, whereas the variability in cell-fate choices of complex biological patterns, which may involve altruism was somewhat reduced after accounting for certain newly iden- (Lee et al., 2010), selfishness, spite, and various forms of coop- tified factors, viral decisions were by far not entirely deterministic eration in addition to stochastic cell fate choices. Observation of (Zeng et al., 2010). Though it may be tempting to expect that patterns in growing microbial colonies (Ben-Jacob et al., 1998) increasingly detailed measurements of the structure and proper- has lead to the proposal of considering microbes as multicellular ties of single cells may enable the exact prediction of cell fate, organisms (Shapiro, 1998). Though criticized by researchers this hope is unlikely to be fully realized. Imagine for a moment from the field of social evolution (West et al., 2006), this proposal that we could find two cells of exactly the same size and molec- brings up an interesting question: which microbial patterns are ular composition and place them into the same environment. functional, and when can patterns evolve? Because patterns These cells could then theoretically have the same fate if all of form readily in nonliving systems due to purely physical reasons, their corresponding molecules would be in identical positions it will be interesting to examine, in the context of sociobiology and would have identical velocities at a given time. However, (West et al., 2007), the conditions when a cell population this condition can never be satisfied in practice because the becomes a multicellular organism (Queller and Strassmann, probability of finding all of the molecules in the same state (posi- 2009) and whether specific biological patterns have biological tion, velocity, etc.) is infinitesimally small. Therefore, noise is function subject to population-level selection. inherent to gene networks confined to small compartments, such as cells or artificial microscopic compartments (Doktycz ACKNOWLEDGMENTS and Simpson, 2007), and cannot be eliminated. Instead, researchers should strive to understand and control We would like to thank D. Nevozhay, R.M. Adams, G.B. Mills, O.A. Igoshin, noise increasingly better in order to control cell fate decisions. R. Azevedo, J.E. Strassmann, and two anonymous reviewers for their helpful

922 Cell 144, March 18, 2011 ª2011 Elsevier Inc. comments on the manuscript. G.B. was supported by the NIH Director’s New Cooper, T.F., and Lenski, R.E. (2010). Experimental evolution with E. coli in Innovator Award Program (grant 1DP2 OD006481-01) and by NSF grant IOS diverse resource environments. I. Fluctuating environments promote diver- 1021675. A.v.O. was supported by the NIH/NCI Physical Sciences Oncology gence of replicate populations. BMC Evol. Biol. 10, 11. Center at MIT (U54CA143874) and the NIH Director’s Pioneer Award Program Dawkins, R. (2006). The Selfish Gene: 30th Anniversary Edition, 30th anniver- (grant 1DP1OD003936). J.J.C. was supported by the NIH Director’s Pioneer sary edn (New York: Oxford University Press). Award Program (grant DP1 OD00344), NIH grants RC2 HL102815 and RL1 Di Talia, S., Skotheim, J.M., Bean, J.M., Siggia, E.D., and Cross, F.R. (2007). DE019021, the Ellison Medical Foundation, and the Howard Hughes Medical The effects of molecular noise and size control on variability in the budding Institute. yeast cell cycle. Nature 448, 947–951. Doktycz, M.J., and Simpson, M.L. (2007). Nano-enabled synthetic biology. REFERENCES Mol. Syst. Biol. 3 , 125. Elowitz, M.B., Levine, A.J., Siggia, E.D., and Swain, P.S. (2002). Stochastic Acar, M., Becskei, A., and van Oudenaarden, A. (2005). Enhancement of gene expression in a single cell. Science 297, 1183–1186. 435 cellular memory by reducing stochastic transitions. Nature , 228–232. Galhardo, R.S., Hastings, P.J., and Rosenberg, S.M. (2007). Mutation as Acar, M., Mettetal, J.T., and van Oudenaarden, A. (2008). Stochastic switching a stress response and the regulation of evolvability. Crit. Rev. Biochem. Mol. as a survival strategy in fluctuating environments. Nat. Genet. 40, 471–475. Biol. 42, 399–435. Ambravaneswaran, V., Wong, I.Y., Aranyosi, A.J., Toner, M., and Irimia, D. Gardner, A., West, S.A., and Griffin, A.S. (2007). Is bacterial persistence (2010). Directional decisions during neutrophil chemotaxis inside bifurcating a social trait? PLoS ONE 2, e752. channels. Integr. Biol. (Camb.) 2, 639–647. Gardner, T.S., Cantor, C.R., and Collins, J.J. (2000). Construction of a genetic 403 Arias, A.M., and Hayward, P. (2006). Filtering transcriptional noise during toggle switch in Escherichia coli. Nature , 339–342. development: concepts and mechanisms. Nat. Rev. Genet. 7, 34–44. Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., and Brown, P.O. (2000). Genomic expression programs in Arkin, A., Ross, J., and McAdams, H.H. (1998). Stochastic kinetic analysis of the response of yeast cells to environmental changes. Mol. Biol. Cell 11, developmental pathway bifurcation in phage lambda-infected Escherichia 4241–4257. coli cells. Genetics 149, 1633–1648. Gillespie, D.T. (1977). Exact stochastic simulation of coupled chemical Balaban, N.Q., Merrin, J., Chait, R., Kowalik, L., and Leibler, S. (2004). Bacte- reactions. J. Phys. Chem. 81, 2340–2361. rial persistence as a phenotypic switch. Science 305, 1622–1625. Glauche, I., Herberg, M., and Roeder, I. (2010). Nanog variability and pluripo- Bar-Even, A., Paulsson, J., Maheshri, N., Carmi, M., O’Shea, E., Pilpel, Y., and tency regulation of embryonic stem cells—insights from a mathematical model Barkai, N. (2006). Noise in protein expression scales with natural protein analysis. PLoS ONE 5, e11238. abundance. Nat. Genet. 38, 636–643. Goldstein, A.M., and Nagy, N. (2008). A bird’s eye view of enteric nervous Beaumont, H.J., Gallie, J., Kost, C., Ferguson, G.C., and Rainey, P.B. (2009). system development: lessons from the avian embryo. Pediatr. Res. 64, 462 Experimental evolution of bet hedging. Nature , 90–93. 326–333. Becskei, A., and Serrano, L. (2000). Engineering stability in gene networks by Gregor, T., Tank, D.W., Wieschaus, E.F., and Bialek, W. (2007). Probing the autoregulation. Nature 405, 590–593. limits to positional information. Cell 130, 153–164. Ben-Jacob, E., Cohen, I., and Gutnick, D.L. (1998). Cooperative organization Hamilton, W.D. (1964). The genetical evolution of social behaviour. I. J. Theor. of bacterial colonies: from genotype to morphotype. Annu. Rev. Microbiol. Biol. 7, 1–16. 52 , 779–806. Han, Y., Wind-Rotolo, M., Yang, H.C., Siliciano, J.D., and Siliciano, R.F. (2007). Biggar, S.R., and Crabtree, G.R. (2001). Cell signaling can direct either binary Experimental approaches to the study of HIV-1 latency. Nat. Rev. Microbiol. 5, or graded transcriptional responses. EMBO J. 20, 3167–3176. 95–106. Blake, W.J., Bala´ zsi, G., Kohanski, M.A., Isaacs, F.J., Murphy, K.F., Kuang, Y., Ha¨ nggi, P., Grabert, H., Talkner, P., and Thomas, H. (1984). Bistable systems: Cantor, C.R., Walt, D.R., and Collins, J.J. (2006). Phenotypic consequences of Master equation versus Fokker-Planck modeling. Phys. Rev. A 29, 371–378. promoter-mediated transcriptional noise. Mol. Cell 24, 853–865. Hanna, J., Saha, K., Pando, B., van Zon, J., Lengner, C.J., Creyghton, M.P., Blake, W.J., Kaern, M., Cantor, C.R., and Collins, J.J. (2003). Noise in eukary- van Oudenaarden, A., and Jaenisch, R. (2009). Direct cell reprogramming is 462 otic gene expression. Nature 422, 633–637. a stochastic process amenable to acceleration. Nature , 595–601. Boettiger, A.N., and Levine, M. (2009). Synchronous and stochastic patterns of Holloway, D.M., Harrison, L.G., Kosman, D., Vanario-Alonso, C.E., and Spirov, gene activation in the Drosophila embryo. Science 325, 471–473. A.V. (2006). Analysis of pattern precision shows that Drosophila segmentation develops substantial independence from gradients of maternal gene products. 28 Bonner, J.T. (2003). On the origin of differentiation. J. Biosci. , 523–528. Dev. Dyn. 235, 2949–2960. Brandman, O., Ferrell, J.E., Jr., Li, R., and Meyer, T. (2005). Interlinked fast Jablonka, E., and Raz, G. (2009). Transgenerational epigenetic inheritance: and slow positive feedback loops drive reliable cell decisions. Science 310, prevalence, mechanisms, and implications for the study of heredity and 496–498. evolution. Q. Rev. Biol. 84, 131–176. Çagatay, T., Turcotte, M., Elowitz, M.B., Garcia-Ojalvo, J., and Su¨ el, G.M. Kaern, M., Elston, T.C., Blake, W.J., and Collins, J.J. (2005). Stochasticity in (2009). Architecture-dependent noise discriminates functionally analogous gene expression: from theories to phenotypes. Nat. Rev. Genet. 6, 451–464. 139 differentiation circuits. Cell , 512–522. Kalmar, T., Lim, C., Hayward, P., Mun˜ oz-Descalzo, S., Nichols, J., Garcia- Chambers, I., Silva, J., Colby, D., Nichols, J., Nijmeijer, B., Robertson, M., Ojalvo, J., and Martinez Arias, A. (2009). Regulated fluctuations in nanog Vrana, J., Jones, K., Grotewold, L., and Smith, A. (2007). Nanog safeguards expression mediate cell fate decisions in embryonic stem cells. PLoS Biol. 7, pluripotency and mediates germline development. Nature 450, 1230–1234. e1000149. Chang, H.H., Hemberg, M., Barahona, M., Ingber, D.E., and Huang, S. (2008). Kirk, D.L. (2005). A twelve-step program for evolving multicellularity and a divi- Transcriptome-wide noise controls lineage choice in mammalian progenitor sion of labor. Bioessays 27, 299–310. cells. Nature 453, 544–547. Klumpp, S., Zhang, Z., and Hwa, T. (2009). Growth rate-dependent global 139 Chickarmane, V., Troein, C., Nuber, U.A., Sauro, H.M., and Peterson, C. effects on gene expression in bacteria. Cell , 1366–1375. (2006). Transcriptional dynamics of the embryonic stem cell switch. PLoS Koonin, E.V., Senkevich, T.G., and Dolja, V.V. (2006). The ancient Virus World Comput. Biol. 2, e123. and evolution of cells. Biol. Direct 1, 29.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 923 Kussell, E., and Leibler, S. (2005). Phenotypic diversity, population growth, and Shapiro, J.A. (1998). Thinking about bacterial populations as multicellular information in fluctuating environments. Science 309, 2075–2078. organisms. Annu. Rev. Microbiol. 52, 81–104. Lee, H.H., Molla, M.N., Cantor, C.R., and Collins, J.J. (2010). Bacterial charity Shea, M.A., and Ackers, G.K. (1985). The OR control system of bacteriophage work leads to population-wide resistance. Nature 467, 82–85. lambda. A physical-chemical model for gene regulation. J. Mol. Biol. 181, Long, T., Tu, K.C., Wang, Y., Mehta, P., Ong, N.P., Bassler, B.L., and 211–230. Wingreen, N.S. (2009). Quantifying the integration of quorum-sensing signals Smukalla, S., Caldara, M., Pochet, N., Beauvais, A., Guadagnini, S., Yan, C., with single-cell resolution. PLoS Biol. 7, e68. Vinces, M.D., Jansen, A., Prevost, M.C., Latge´ , J.P., et al. (2008). FLO1 is a vari- Lopez, D., Vlamakis, H., and Kolter, R. (2009). Generation of multiple cell types able green beard gene that drives biofilm-like cooperation in budding yeast. 135 in Bacillus subtilis. FEMS Microbiol. Rev. 33, 152–163. Cell , 726–737. Maamar, H., Raj, A., and Dubnau, D. (2007). Noise in gene expression deter- Spencer, S.L., Gaudet, S., Albeck, J.G., Burke, J.M., and Sorger, P.K. (2009). mines cell fate in Bacillus subtilis. Science 317, 526–529. Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature 459, 428–432. Maheshri, N., and O’Shea, E.K. (2007). Living with noisy genes: how cells function reliably with inherent variability in gene expression. Annu. Rev. Biophys. St-Pierre, F., and Endy, D. (2008). Determination of cell fate selection during 105 Biomol. Struct. 36, 413–434. phage lambda infection. Proc. Natl. Acad. Sci. USA , 20705–20710. ¨ Maynard Smith, J., and Szathma´ ry, E. (1995). The Major Transitions in Evolu- Suel, G.M., Garcia-Ojalvo, J., Liberman, L.M., and Elowitz, M.B. (2006). An tion (Oxford: Oxford University Press). excitable gene regulatory circuit induces transient cellular differentiation. Nature 440, 545–550. Mehta, P., Mukhopadhyay, R., and Wingreen, N.S. (2008). Exponential sensi- Su¨ el, G.M., Kulkarni, R.P., Dworkin, J., Garcia-Ojalvo, J., and Elowitz, M.B. tivity of noise-driven switching in genetic networks. Phys. Biol. 5, 026005. (2007). Tunability and noise dependence in differentiation dynamics. Science ´ Murphy, K.F., Adams, R.M., Wang, X., Balazsi, G., and Collins, J.J. (2010). 315, 1716–1719. Tuning and controlling gene expression noise in synthetic gene networks. Sureka, K., Ghosh, B., Dasgupta, A., Basu, J., Kundu, M., and Bose, I. (2008). Nucleic Acids Res. 38, 2712–2726. Positive feedback and noise activate the stringent response regulator rel in Nevozhay, D., Adams, R.M., Murphy, K.F., Josic, K., and Bala´ zsi, G. (2009). mycobacteria. PLoS One 3, e1771. Negative autoregulation linearizes the dose-response and suppresses the Takahashi, K., and Yamanaka, S. (2006). Induction of pluripotent stem cells heterogeneity of gene expression. Proc. Natl. Acad. Sci. USA 106, 5123–5128. from mouse embryonic and adult fibroblast cultures by defined factors. Cell Newman, J.R., Ghaemmaghami, S., Ihmels, J., Breslow, D.K., Noble, M., 126, 663–676. DeRisi, J.L., and Weissman, J.S. (2006). Single-cell proteomic analysis of Tan, C., Marguet, P., and You, L. (2009). Emergent bistability by a growth- S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846. modulating positive feedback circuit. Nat. Chem. Biol. 5, 842–848. Novick, A., and Weiner, M. (1957). Enzyme induction as an all-or-none Teng, S.W., Wang, Y., Tu, K.C., Long, T., Mehta, P., Wingreen, N.S., Bassler, phenomenon. Proc. Natl. Acad. Sci. USA 43, 553–566. B.L., and Ong, N.P. (2010). Measurement of the copy number of the master Octavio, L.M., Gedeon, K., and Maheshri, N. (2009). Epigenetic and conven- quorum-sensing regulator of a bacterial cell. Biophys. J. 98, 2024–2031. tional regulation is distributed among activators of FLO11 allowing tuning of Thattai, M., and van Oudenaarden, A. (2004). Stochastic gene expression in population-level heterogeneity in its expression. PLoS Genet. 5, e1000673. fluctuating environments. Genetics 167, 523–530. Oppenheim, A.B., Kobiler, O., Stavans, J., Court, D.L., and Adhya, S. (2005). Tiwari, A., Bala´ zsi, G., Gennaro, M.L., and Igoshin, O.A. (2010). The interplay of Switches in bacteriophage lambda development. Annu. Rev. Genet. 39, multiple feedback loops with post-translational kinetics results in bistability of 409–429. mycobacterial stress response. Phys. Biol. 7, 036005. Ozbudak, E.M., Thattai, M., Kurtser, I., Grossman, A.D., and van Oudenaar- Tu, K.C., Long, T., Svenningsen, S.L., Wingreen, N.S., and Bassler, B.L. (2010). den, A. (2002). Regulation of noise in the expression of a single gene. Nat. Negative feedback loops involving small regulatory RNAs precisely control the Genet. 31, 69–73. Vibrio harveyi quorum-sensing response. Mol. Cell 37, 567–579. Pa´ l, C., and Miklo´ s, I. (1999). Epigenetic inheritance, genetic assimilation and Veening, J.W., Igoshin, O.A., Eijlander, R.T., Nijland, R., Hamoen, L.W., and speciation. J. Theor. Biol. 200, 19–37. Kuipers, O.P. (2008a). Transient heterogeneity in extracellular protease Paliwal, S., Iglesias, P.A., Campbell, K., Hilioti, Z., Groisman, A., and production by Bacillus subtilis. Mol. Syst. Biol. 4, 184. Levchenko, A. (2007). MAPK-mediated bimodal gene expression and adaptive Veening, J.W., Stewart, E.J., Berngruber, T.W., Taddei, F., Kuipers, O.P., and gradient sensing in yeast. Nature 446, 46–51. Hamoen, L.W. (2008b). Bet-hedging and epigenetic inheritance in bacterial cell Pelletier, J., and Sonenberg, N. (1988). Internal initiation of translation of development. Proc. Natl. Acad. Sci. USA 105, 4393–4398. eukaryotic mRNA directed by a sequence derived from poliovirus RNA. Nature Waddington, C.H., and Kacser, H. (1957). The Strategy of the Genes: A Discus- 334, 320–325. sion of Some Aspects of Theoretical Biology (London, UK: George Allen & Ptashne, M. (1967). Specific binding of the lambda phage repressor to lambda Unwin). DNA. Nature 214, 232–234. Wahl, L.M. (2002). Evolving the division of labour: generalists, specialists and Ptashne, M. (2004). A Genetic Switch: Phage Lambda Revisited, Third Edition task allocation. J. Theor. Biol. 219, 371–388. (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press). Walczak, A.M., Onuchic, J.N., and Wolynes, P.G. (2005). Absolute rate theo- Queller, D.C., and Strassmann, J.E. (2009). Beyond society: the evolution of ries of epigenetic stability. Proc. Natl. Acad. Sci. USA 102, 18926–18931. 364 organismality. Philos. Trans. R. Soc. Lond. B Biol. Sci. , 3143–3155. Wang, J., Xu, L., and Wang, E. (2008). Potential landscape and flux framework Raj, A., Peskin, C.S., Tranchina, D., Vargas, D.Y., and Tyagi, S. (2006). of nonequilibrium networks: robustness, dissipation, and coherence of Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309. biochemical oscillations. Proc. Natl. Acad. Sci. USA 105, 12271–12276. Raj, A., Rifkin, S.A., Andersen, E., and van Oudenaarden, A. (2010). Variability Waters, C.M., and Bassler, B.L. (2005). Quorum sensing: cell-to-cell commu- in gene expression underlies incomplete penetrance. Nature 463, 913–918. nication in bacteria. Annu. Rev. Cell Dev. Biol. 21, 319–346. Rao, C.V., Wolf, D.M., and Arkin, A.P. (2002). Control, exploitation and Weinberger, L.S., Burnett, J.C., Toettcher, J.E., Arkin, A.P., and Schaffer, D.V. tolerance of intracellular noise. Nature 420 , 231–237. (2005). Stochastic gene expression in a lentiviral positive-feedback loop: HIV-1 122 Ray, J.C., and Igoshin, O.A. (2010). Adaptable functionality of transcriptional Tat fluctuations drive phenotypic diversity. Cell , 169–182. feedback in bacterial two-component systems. PLoS Comput. Biol. 6, Weinberger, L.S., Dar, R.D., and Simpson, M.L. (2008). Transient-mediated e1000676. fate determination in a transcriptional circuit of HIV. Nat. Genet. 40, 466–470.

924 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Weitz, J.S., Mileyko, Y., Joh, R.I., and Voit, E.O. (2008). Collective decision Wiesenfeld, K., and Moss, F. (1995). Stochastic resonance and the benefits of making in bacterial viruses. Biophys. J. 95, 2673–2680. noise: from ice ages to crayfish and SQUIDs. Nature 373, 33–36. Wolf, D.M., Vazirani, V.V., and Arkin, A.P. (2005). Diversity in times of adversity: Wernet, M.F., Mazzoni, E.O., Celik, A., Duncan, D.M., Duncan, I., and Desplan, probabilistic strategies in microbial survival games. J. Theor. Biol. 234, C. (2006). Stochastic spineless expression creates the retinal mosaic for 227–253. colour vision. Nature 440, 174–180. Wolk, C.P. (1996). Heterocyst formation. Annu. Rev. Genet. 30, 59–78. West, S.A., Griffin, A.S., and Gardner, A. (2007). Social semantics: altruism, Yamanaka, S. (2009). Elite and stochastic models for induced pluripotent stem cooperation, mutualism, strong reciprocity and group selection. J. Evol. Biol. cell generation. Nature 460, 49–52. 20, 415–432. Zeng, L., Skinner, S.O., Zong, C., Sippy, J., Feiss, M., and Golding, I. (2010). West, S.A., Griffin, A.S., Gardner, A., and Diggle, S.P. (2006). Social evolution Decision making at a subcellular level determines the outcome of bacterio- theory for microorganisms. Nat. Rev. Microbiol. 4, 597–607. phage infection. Cell 141, 682–691.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 925 Leading Edge Review

Impulse Control: Temporal Dynamics in Gene Transcription

Nir Yosef1,2 and Aviv Regev1,3,* 1Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA 2Center for Neurologic Diseases, Brigham & Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA 3Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.02.015

Regulatory circuits controlling gene expression constantly rewire to adapt to environmental stimuli, differentiation cues, and disease. We review our current understanding of the temporal dynamics of gene expression in eukaryotes and prokaryotes and the molecular mechanisms that shape them. We delineate several prototypical temporal patterns, including ‘‘impulse’’ (or single-pulse) patterns in response to transient environmental stimuli, sustained (or state-transitioning) patterns in response to developmental cues, and oscillating patterns. We focus on impulse responses and their higher-order temporal organization in regulons and cascades and describe how core protein circuits and cis-regulatory sequences in promoters integrate with chromatin architecture to generate these responses.

Introduction to transform our ability to manipulate cellular inputs and compo- The transcriptional program that controls gene expression in nents at unparalleled temporal resolution. cells and organisms is remarkably flexible, constantly reconfi- Here, we review recent advances in our understanding of tran- guring itself to respond and adapt to perturbations. These scriptional dynamics, including the prototypical patterns of changes are apparent across a broad range of timescales, temporal mRNA expression and their underlying molecular from rapid responses to environmental signals (i.e., minutes to mechanisms. We identify a small number of prominent temporal hours) to slower events during development and pathogenesis patterns, such as single pulse responses (‘‘impulses’’), sustained (i.e., hours to days) (Lopez-Maury et al., 2008). state-transitioning patterns, and oscillations. Focusing on Dissecting these dynamic changes, both functionally and impulse responses, we then present the molecular circuits that mechanistically, is a fundamental challenge in biology and raises generate these patterns, highlighting the prominent role that several key questions. What is the scope of temporal patterns of transcription factor localization, integration of multiple inputs gene expression in biological systems? What functions do through cis-regulatory elements, and nucleosome occupancy different patterns serve? What molecular mechanisms underlie play in tuning the response to a given stimulus. Finally, we the formation of each pattern, and what is their capacity to discuss the prospect for a unified view of regulatory dynamics process the temporal signal into a specific change in gene across timescales and systems, emphasizing critical directions expression over time? Finally, are any principles, either func- for further research. tional or mechanistic, shared among temporal responses in distinct timescales? Prototypical Patterns of Temporal Dynamics Recent parallel advances in genomics and cell biology provide What capacity does a cell or organism have to generate an unprecedented opportunity to map dynamic gene expression temporal patterns of gene expression? Recent studies reveal and decipher its underlying mechanisms. At the same time, live- several key classes of patterns (Figure 1). The first one, indefinite cell imaging of fluorescent reporter proteins (Locke and Elowitz, oscillators (Figure 1A), plays integral roles in homeostasis, such 2009) allows us to study gene expression at fine temporal reso- as the execution of the cell cycle or circadian rhythm. Other lution and at the single-cell level. Such studies, when coupled classes of temporal patterns follow an external stimulus. These with molecular manipulations and quantitative modeling, can include impulse (or single-pulse) patterns in response to envi- identify basic mechanisms of temporal patterning. Further, ronmental stimuli (Figures 1B–1D) and sustained (or state- genomic technologies provide global insights on the regulation transitioning) patterns in response to developmental stimuli of gene expression by allowing us to measure and perturb (Figure 1E). Each of these patterns serves a set of interrelated many aspects of the regulatory system, such as mRNA levels, functional goals, including optimizing the investment of cellular protein-promoter interactions (Badis et al., 2009; Lee et al., responses, temporally compartmentalizing antagonistic pro- 2002), or chromatin modification states (Wei et al., 2009; cesses, and imposing order on the biogenesis of complex bio- Whitehouse et al., 2007). Finally, emerging methods in synthetic logical systems. On a systems-wide scale, the regulation of biology, robotics, and microfluidics (Szita et al., 2010) are poised individual genes is commonly organized at a higher order into

886 Cell 144, March 18, 2011 ª2011 Elsevier Inc. ordered sequentially. Here, we focus on the impulse-like pattern, speciﬁcally its function and integration within transcriptional programs.

Impulse (Single-Pulse) Responses to Environmental Signals Changes in gene expression in response to perturbations of the surrounding environment, such as heat, salinity, or osmotic pres- sure, typically follow a characteristic ‘‘impulse’’-like pattern (Chechik and Koller, 2009; Chechik et al., 2008). Transcript levels spike up or down abruptly following the environmental cue, sustain a new level for a certain period of time (which may or may not depend on the continuation of the cue), and then transition to a new steady state, often similar to the original levels (Figure 1B). Impulse patterns are prevalent in responses to environmental changes in all organisms, from bacteria to mammals (Braun and Brenner, 2004; Gasch et al., 2000; Litvak et al., 2009; Lopez-Maury et al., 2008; Murray et al., 2004). One of the most extensively studied impulse systems is the environmental stress response (ESR) program in yeast. The ESR consists of 900 genes that exhibit short-term changes in transcription levels in response to various environmental stresses (Gasch et al., 2000). The transient impulse pattern of the ESR likely represents an adaptation phase, during which the cell optimizes its internal protein milieu before resuming growth (Gasch et al., 2000). Indeed, many of the downregulated genes in the ESR are associated with protein synthesis, reflecting the characteristic transient suppression in translation initiation and growth (Gasch et al., 2000). The ESR is also associated with the brief induction of genes involved in specific response mechanisms, such as DNA-damage repair, carbohydrate metabolism, and metabolite transport (Capaldi et al., 2008; Gasch et al., 2000). A notable exception to the impulse-like stress response in yeast is the case of starvation, in which the cells initiate more sustained programs, such as quiescence, fila- mentation, or sporulation (Lopez-Maury et al., 2008). Transient impulse patterns are also prevalent in mammalian cells (Foster et al., 2007; Litvak et al., 2009; Murray et al., 2004), extending beyond environmental stimuli. For example, when innate immune cells, such as macrophages (Gilchrist et al., 2006; Ramsey et al., 2008) or dendritic cells (Amit et al., 2009), respond to pathogens, expression changes in individual genes follow a clear impulse pattern. These patterns, however, are often coupled to each other, forming multistep transcriptional cascades, in which the products of genes that are induced early in a response affect the expression of downstream targets. Figure 1. Prototypical Patterns of Temporal Dynamics of Gene These targets, in turn, may exhibit either an impulse pattern or Expression a more sustained one that initiates a long-term change in the Schematic views of gene expression levels (y axis; arbitrary units) over time cell’s state (Amit et al., 2007a; Murray et al., 2004). (x axis) commonly found in cells in steady state or during a response to envi- Sign-Sensitive Delay and Persistence Detection ronmental, developmental, or pathogenic stimuli. Blue and red plots show in Impulse Responses possible profiles for different genes under each category. Common functions for these gene expression patterns are listed. Impulse patterns can respond distinctly to the introduction of a signal versus its withdrawal. This differential response results in a ‘‘sign-sensitive delay’’ (Figure 1C), in which the speed of regulons, in which a group of genes are controlled by the same the cell’s response to one ‘‘sign-shift’’ (e.g., from the presence transcription factors and, thus, share the same gene expression to the absence of a nutrient) is different from that of the comple- patterns. In addition, genes can be organized into transcrip- mentary shift (e.g., from the absence to the presence of tional cascades and other patterns in which expression is a nutrient).

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 887 Sign-sensitive delays are common in responses of microor- response of genes needed for oxidative stress, although these ganisms to changes in nutrients. For example, consider the genes are not directly necessary for adaptation to heat shock. arabinose-utilization system of E. coli, in which cyclic adenosine Interestingly, yeast do not induce heat shock genes in response monophosphate (cAMP) regulates transcription from the to oxidative stress (Mitchell et al., 2009). This asymmetry L-arabinose operon. The transcriptional response to an increase (Figure 1D) may reflect the predictable order of the two stresses in cAMP (i.e., ‘‘on’’ sign) is much slower than to a cAMP decrease under natural circumstances: oxidative respiration and accumu- (i.e., ‘‘off’’ sign) (Mangan et al., 2003). One possible reason for lation of oxidative radicals follow a temperature increase during this asymmetry is that, at least inside a mammalian host, the fermentation. ‘‘on’’ state is common whereas the ‘‘off’’ state is maintained Notably, this anticipation strategy differs from symmetrical only during short and rare pulses of glucose. Consequently, cross-protection (Kultz, 2005) through shared stress-response although the cell can halt the production of L-arabinose genes pathways (Gasch et al., 2000). Rather, it indicates that any opti- soon after the introduction of glucose, it can tolerate slower mization of transcriptional programs during evolution occurred in commencement of their production when glucose levels a complex adaptive landscape. Thus, a strategy that may appear decrease and cAMP is produced (Mangan et al., 2003). Alterna- ‘‘suboptimal’’ when considering only one stimulus in the lab may tively, a sign-sensitive delay may reflect noise filtering; the cell indeed be optimal in the presence of multiple simultaneous or refrains from activation of response pathways following spurious sequential stimuli. or transient signals. For the arabinose system, the ‘‘on’’ switch Higher-Order Temporal Coordination of Impulse delay is approximately 20 min, comparable to the timescale of Responses spurious pulses of cAMP in other natural settings (Alon, 2007). A functional temporal program of gene expression requires Conversely, a delayed response to the ‘‘off’’ switch can appropriate temporal coordination between genes (Figure 1F). prolong the effect of a transient stimulus. For example, the Studies reveal two main classes of temporal coordination: regu- expression of flagella motor genes in E. coli persists for 1 hr after latory modules and timing motifs. the biogenesis input signal is turned off, but no delay occurs A regulatory module consists of genes that are coexpressed during the on switch. Indeed, this delay time in shutting down with the same temporal pattern or amplitude (FANTOM is comparable to the time needed for the biogenesis of consortium et al., 2009; Gasch et al., 2000; Spellman et al., a complete flagella motor (Kalir et al., 2005). 1998). Regulatory modules serve to coordinate the production Similar principles of signal processing in impulse responses of proteins that are needed to perform relevant cellular functions have also been observed in mammalian systems. For instance, in the given response. Regulatory modules are a hallmark of all a small regulatory circuit that controls the expression of the known transcriptional programs and all known temporal patterns gene encoding the proinflammatory cytokine interleukin-6 (Figure 1), including oscillatory patterns (e.g., Spellman et al., (IL-6) in mouse macrophages exhibits a delayed response to 1998), sustained responses (e.g., FANTOM consortium et al., lipopolysaccharide (LPS) stimulation (the on switch) and discrim- 2009), and impulse responses (e.g., Chechik et al., 2008). inates between transient and persistent signals in the innate Complementing the tight temporal coincidence within regu- immune system (Litvak et al., 2009). Other ‘‘persistence detec- lons, timing motifs reflect a particular order of transcriptional tion’’ mechanisms have also been observed in transcriptional events among genes or modules, such as a linear cascade of responses to DNA damage (Loewer et al., 2010), to epidermal genes with sequentially ordered expression (Alon, 2007; Chechik growth factor (EGF) (Amit et al., 2007a), and to extracellular- et al., 2008; Ihmels et al., 2004). In microorganisms, such signal-regulated kinase (ERK) signaling (Murphy et al., 2002). ordering is commonly observed among genes encoding meta- Transcriptional Anticipation as an Adaptation to bolic and biosynthetic enzymes, and therefore, it can play an Dynamic or Noisy Environments important role in achieving metabolic efficiency or avoiding toxic Most studies of environmental stimuli in the lab focus on one sus- intermediates (Chechik et al., 2008; Ihmels et al., 2004; Zaslaver tained signal at a time, but the natural environment to which cells et al., 2004). For example, following deprivation of amino acids, are adapted is substantially more complex, noisy, and irregular E. coli induces the expression of amino acid metabolic genes in (Lopez-Maury et al., 2008; Wilkinson, 2009). Impulse-like tran- the same order that their encoded enzymes are present in the scriptional programs reflect some strategies that cells employ relevant amino acid biosynthetic pathway (Zaslaver et al., to handle such temporally fluctuating environments. 2004). This ‘‘just-in-time’’ pattern (Zaslaver et al., 2004), which Random fluctuations are optimally handled by sensing may optimize resource utilization, has also been observed in environmental changes and specifically responding by tran- other bacterial processes, most notably flagellar biogenesis scriptional changes in relevant genes, as described above (Kalir et al., 2005). (e.g., Capaldi et al., 2008; Gasch et al., 2000). In certain cases, A broader range of ordered patterns of expression onset, typi- a population of cells may respond stochastically; they activate cally in impulse responses, is found in metabolic enzymes in different changes in gene expression in different cells of the yeast (Chechik et al., 2008; Ihmels et al., 2004). These include same population, thus ‘‘hedging’’ their adaptive bets timing motifs with gene expression in the same order as the (Lopez-Maury et al., 2008). metabolic pathway (i.e., a just-in-time induction or shutoff of When fluctuations are stable and predictable, bacteria and a pathway), as well as in the reverse order to the metabolic yeast cells may use an anticipatory strategy for gene regulation pathway. These reversed directions possibly contribute to the (Mitchell et al., 2009; Tagkopoulos et al., 2008). For example, fast removal of an end metabolite that is either toxic or otherwise when exposed to heat shock, yeasts induce an impulse disruptive under the new condition (Chechik et al., 2008).

888 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Coordinated timing motifs are also found at metabolic branch feron-beta 1 or IFNB1) and motility (e.g., chemokine ligand 3 or points (Chechik et al., 2008; Ihmels et al., 2004). For example, CCL 3) during that time interval (2–4 hr post-stimulus) in the consider a metabolic funnel, where two enzymes (A, B) produce in vivo innate immune response. This temporal organization complementary metabolites that are together consumed by allows innate immune cells to activate the CCL3 ligand at the a third reaction (catalyzed by C). In the ‘‘funnel-same-time’’ appropriate time, favoring the migration of activated cells to the motif, the genes that encode the three enzymes (A, B, and C) draining lymph node to activate the adaptive immune response. are often expressed simultaneously, thus optimizing metabolite Long transcriptional cascades of ordered sequential regula- use by coordinating the production or consumption of metabo- tion are also at the basis of many complex developmental lites along codependent branches. Similar temporal coordina- processes (Davidson, 2010). For instance, in the sea urchin tion was found for the genes encoding enzymes in ‘‘forks,’’ embryo, the transcriptional program of skeletogenic cell devel- involving one enzyme producing two metabolites, which are opment in endomesoderm specification includes several layers then consumed by two separate reactions. of regulation that correspond to developmental phases (Oliveri Ordered Impulse Responses within State-Transitioning et al., 2008). Progression through the phases is facilitated by Systems a regulatory cascade in which transcription factors that are Cell-fate decisions are typically associated with stable changes active during one phase (e.g., early micromere specification) in gene expression that transition the regulatory system from one activate genes in the next phase (e.g., late specification). steady state to the next (Figure 1E). Such cell-fate decisions are Notably, transcriptional changes in genes encoding regulatory prevalent in development (Basma et al., 2009; Nachman et al., factors can also feedback and regulate the expression of their 2007; Oliveri et al., 2008), pathogenesis (Iliopoulos et al., 2009), temporal ‘‘predecessors’’ (Amit et al., 2007a). Such mechanisms and immune responses (Amit et al., 2007a, 2009; Ramsey are used to shape both impulse responses and sustained et al., 2008; Wei et al., 2009). State transitioning in cells involves responses, as we discuss below. sustained induction or repression of gene expression, stabilizing the cell on a new characteristic expression program, and disas- Mechanism of Temporal Control of Impulse Responses sociating it from its precursors. Nevertheless, processes that What is the cell’s capacity to ‘‘compute’’ a temporal pattern of lead to such stable changes often involve a succession of mRNA expression? Are there canonical molecular mechanisms impulse responses that promote transient effects necessary for that underlie distinct types of patterns? Can a single mechanistic achieving the transition. unit generate more than one pattern depending on the incoming Such a combination of transient and stable changes in tran- signal or its downstream target? In this section, we focus on the scription was observed during PMA (phorbol myristate molecular mechanisms that generate impulse responses at acetate)-induced differentiation of myelomonocytic leukemia single genes, gene modules, and temporal motifs. cells (THP-1) cells (FANTOM consortium et al., 2009). Sustained Network Architecture Can Be Decomposed responses included repression of genes required for cell-cycle into Characteristic Topological Motifs progression and DNA synthesis, which is consistent with the Regulatory systems that control gene expression are often rep- growth arrest associated with PMA-induced differentiation. In resented as networks (i.e., directed graphs) with the nodes cor- addition, genes that characterize the differentiated phenotype responding to regulatory proteins (e.g., transcription factors) and (e.g., immune response) were persistently induced. Conversely, the edges linking a DNA-binding protein to proteins encoded by transient, impulse-like, changes were associated with various genes it binds to and regulates (e.g., Hu et al., 2007; Lachmann transcription factors that play an important role early in the tran- et al., 2010; Shen-Orr et al., 2002). Such graphs have been sition, promoting the differentiation program prior to repression assembled from many small-scale studies on regulation of indi- of the factors that maintain the undifferentiated state. A similar vidual genes and operons (Shen-Orr et al., 2002) or by system- pattern, specifically immediate early impulse responses of key atic chromatin immunoprecipitation (ChIP), in vitro assays, and regulators followed by stable changes of downstream genes, computational analysis of cis-regulatory sequence elements has been observed in many other mammalian systems, including (Badis et al., 2009; Harbison et al., 2004; Hu et al., 2007; Lach- responses to growth (Amit et al., 2007a), pathogens (Amit et al., mann et al., 2010; Lee et al., 2002). 2009), and stress (Murray et al., 2004) signals. Although network graphs appear highly complex, they can be Impulse responses are not limited to the immediate wave of effectively decomposed to putative functional units based on transcription at the beginning of the state transition. Rather, recurring topological patterns (Figure 2). These ‘‘network motifs’’ a succession of impulses, forming a series of transcriptional (Shen-Orr et al., 2002) are small subnetworks consisting of only ‘‘waves,’’ has been observed in various state-transitioning a few nodes and edges with a topological pattern that is signifi- responses (Amit et al., 2007a, 2009; Ramsey et al., 2008; Shapira cantly overrepresented in the transcriptional graph. et al., 2009). For instance, the response of immune dendritic cells Although the patterns themselves are static, they can be to pathogens involves several waves of induction in which core- associated, analytically (Bolouri and Davidson, 2003; Goentoro gulated genes follow a simple impulse profile with distinct onset et al., 2009; Kittisopikul and Suel, 2010; Mangan et al., 2003; and offset times (Amit et al., 2007a). As in PMA-induced differen- Shen-Orr et al., 2002; Tyson et al., 2003) or experimentally tiation, the first wave is an immediate-early response enriched for (Basu et al., 2004; Cantone et al., 2009; Kaplan et al., 2008; Man- genes that encode proteins with roles in transcriptional regula- gan et al., 2006; Rosenfeld et al., 2002), with different dynamic tion. Then a subsequent transcriptional wave is enriched for interpretations, thus relating the architecture of these network genes that are required for extracellular signaling (e.g., inter- components with a functional capacity for generating temporal

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 889 One commonly found structure in transcriptional networks (Alon, 2007) is the Type-1 coherent feedforward loop, in which all of the interactions are activating. This feedforward loop can generate a sign-sensitive time delay. The length of the delay and whether it occurs during the off or the on switch depends on the specific molecular parameters of the loop. The particular logic mediated by the loop largely depends on the organization of cis-regulatory elements in the promoter of the target gene (‘‘Z’’). For instance, when the two transcription factors in a coherent feedforward loop exhibit an ‘‘or’’ logic at the promoter of the downstream gene (i.e., only one transcription factor suffices to activate the gene), the resulting dynamics is usually a sign-sensitive delay with faster response to the on switch and a prolonged transcriptional response, as in flagellar biogenesis (Kalir et al., 2005). Conversely, an ‘‘and’’ logic for the two transcription factors (i.e., both factors are needed to activate the gene) is associated with a faster response to the off switch, as in the L-arabinose operon. This feedforward loop structure facilitates persistence detection (Mangan et al., 2003). Another prevalent form of the feedforward loop is the incoherent variant (Figure 2D), in which Y acts as a repressor rather than an activator. Depending on its parameters, this motif can induce pulse-like responses (Basu et al., 2004), lead to a rapid (Mangan et al., 2006) or nonmonotone (Kaplan et al., 2008) Figure 2. General Network Motifs in Transcriptional Regulatory response of the downstream target Z, or provide a mechanism Networks for detecting fold-change (e.g., that a component’s level changed General motifs found in transcriptional regulatory networks are shown. Nodes by 2-fold rather than an absolute value) (Goentoro et al., 2009). represent proteins; edges are directed from a DNA-binding protein to a protein encoded by a gene to which it binds and regulates. Arrows and blunt-arrows Single-Input Modules and Chromatin Architecture represent activation and repression, respectively; circle-ending arrows are Coordinate Responses in Modules and in Just-in-Time either activation (+) or repression (À). Relevant functions for these motifs are Motifs listed. The single-input module (Figure 2F) motif occurs when a single regulator has multiple targets (Alon, 2007; Lee et al., 2002). responses (Figure 2). These responses include rapid or slowed This architecture, often associated with regulatory hubs (‘‘master responses (Figures 2A and 2B), feedback control (Figure 2C), regulators’’), can facilitate a temporally coordinated response of sign-sensitive delays (Figure 2D), temporal ordering (Figure 2E), multiple genes in a module. and temporal coordination in modules (Figure 2F). However, the activation of the downstream genes in a single- Notably, the relation between the topology of a motif and its input module is not necessarily concurrent, and differences in induced temporal pattern is far from unique and depends on their promoter properties can lead to ordered activation the characteristics of the incoming signal and of the interacting (Figure 3). Specifically, a transcription factor’s affinity for molecules (Macia et al., 2009). For instance, protein production a specific cis-regulatory sequence affects the fraction of time rate, protein degradation rate, or activation thresholds of regula- that it occupies a binding site (Bruce et al., 2009; Tanay, 2006). tors can each alter the dynamic transcriptional pattern generated The stronger the binding affinity, the higher the probability that by the motif (Lahav et al., 2004). Moreover, different motifs or the transcription factor remains bound to a site and recruits the combinations of motifs (Geva-Zatorsky et al., 2006) can induce transcriptional machinery (Hager et al., 2009). Differential recruit- similar behaviors. For a more thorough discussion of network ment at different promoters results in a range of induction thresh- motifs, we refer the reader to other extensive reviews (Alon, olds, allowing a single transcription factor with a temporally 2007; Davidson, 2009, 2010; Tyson et al., 2003). fluctuating level to generate an ordering of its target genes. Combinatorial Logic in the Feedforward Loop Generates This principle was demonstrated in a recent study using Sign-Sensitive Delays a series of genetically modified promoters of the Pho5 gene The feedforward loop (Figure 2D) is a major building block of during the response to phosphate starvation in yeast (Lam combinatorial regulation (Amit et al., 2009). A feedforward loop et al., 2008). In this system (Figure 3), promoters with high-affinity has a unidirectional structure consisting of three nodes: an sites for the transcription factor Pho4 that are ‘‘open’’ (i.e., not upstream regulator X that regulates a downstream regulator Y, occluded by nucleosomes) responded to weaker signals of slight which in turn regulates a downstream target Z (which is not phosphate deprivation (Figure 3B) and had a shorter response necessarily a regulator). An additional edge is directed from X time (Figure 3A) to phosphate starvation compared to those to Z, thus closing a unidirectional ‘‘loop.’’ Each interaction can with lower-affinity sites. Similar behavior was observed for be suppressing or activating, resulting in eight distinct feedfor- synthetic promoter variants and for different targets of Pho4 ward loop structures. that had similar promoter architecture.

890 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Thus, graded binding affinities complement the single-input module motif in which a single transcription factor induces temporal ordering among its targets through differential binding affinity (Figure 3A). In the phosphate starvation responses, this results in tuning of the responding genes to the severity and duration of phosphate depletion. At intermediate phosphate levels (with intermediate levels of nuclear Pho4), first-response genes with exposed high-affinity sites like PHO84 and PHM4 allow the cell to take up environmental phosphate and mobilize internal reserves. Under starvation conditions, this initial response is followed by a second-order response such as upregulation of PHO5 and other phosphate-scavenging components (Springer et al., 2003). More generally, such graded affinities may explain the ordered timing of an impulse-like response of genes within metabolic pathways, in timing motifs such as just-in-time. In yeast, the timing of ordered activation in a timing motif was found to correlate with the affinity of the respective gene with its regulating transcription factor (Chechik and Koller, 2009; Chechik et al., 2008). Similar principles were also observed in E. coli (Zaslaver et al., 2004). Nucleosome Positioning Contributes to Activation Timing The position of nucleosomes in a gene promoter impacts the accessibility of transcription factors for their DNA-binding sites. Therefore, nucleosome positioning also affects the order of acti- vations across several genes regulated by the same transcription factor. This effect was convincingly demonstrated in the Pho5 system (Lam et al., 2008). Most Pho4-binding sites are occluded under nucleosomes in normal conditions, but they become exposed when chromatin is dynamically remodeled in response to phosphate starvation (Figure 3B). The threshold of response, and hence a gene’s onset time, is thus also affected by the chromatin architecture of the repressed state. Conversely, the dynamic range of the response is determined by the active state’s architecture. Maximum transcriptional Figure 3. Promoter Regions and Nucleosome Positioning as outputs of the Pho5 variants differed by up to 7-fold and corre- Temporal Signal Processors (A) The transcription factor Pho4 (orange oval) targets different variants of the lated with the number, affinity, and placement of Pho4 sites, Pho5 promoter following phosphate starvation in yeast cells (left). The purple irrespective of their accessibility in the initial (pre-starvation) (upper) promoter contains the wild-type Pho5 promoter sequence, whereas chromatin state. These results suggest a mechanism by which the green and red promoters (denoted as H1 and H3, respectively) are the cell decouples the determinants of promoter activation synthetic variants. Each target exhibits a different response time (right), depending on the affinity of Pho4 for its binding site when the site is unoccluded timing (site affinity and nucleosome positions) from the determi- by nucleosomes (depicted in panel B). The y axis corresponds to median nants of expression capacity (site affinity alone). Global studies fluorescence levels, across separate measurements, scaled between the on changes in nucleosome positions in response to environ- promoter-specific expression minimum at 0 hr and maximum at 7 hr after induction. mental signals (Deal et al., 2010) support the generality of the (B) Suggested mechanism for decoupling promoter induction threshold from Pho model, at least in yeast (Shivaswamy et al., 2008). dynamic range. These cartoons show occupancies of Pho4 and nucleosome Protein Oscillators Generate Coordinated Impulse at the three Pho4 promoter variants under mild (left) and acute (right) phos- Responses across Regulons phate starvation. Gray-blue and yellow ovals represent nucleosomes and Pho4, respectively; dark blue circles and red triangles correspond to low- Recent studies suggest that oscillations in the localization or affinity and high-affinity binding sites, respectively; and X marks ablation of the activity of trans-regulators that control single-input modules Pho4-binding motif. Darker blue ovals represent more highly occupied play a substantial role in governing (nonoscillating) impulse tran- nucleosomes (across a cell population). Under intermediate levels of phosphate (left), substantial Pho4 occupancy and subsequent transcriptional scriptional patterns. Most notably, coordinated impulse patterns activity occurs only at promoters with exposed high-affinity sites. The plot at across a regulon may often stem from limited oscillations in the the bottom left shows the respective expression levels, divided for each variant nuclear localization of a regulatory factor controlling the target by the maximum level at full starvation in arbitrary units (a.u.). In the absence of phosphate (right), Pho4 activity is saturated, resulting in nucleosome eviction genes (Ashall et al., 2009; Cai et al., 2008). This has been sug- and maximum expression at all promoters. The plot at the bottom right shows gested for the transcription factor Crz1 in yeast, which uses the respective maximal induction levels (a.u.). Reproduced from Lam et al. a ‘‘pulsing’’ mechanism to encode information about extracel- (2008), with permission from the authors. lular calcium levels (Figure 4)(Cai et al., 2008). When extracellular

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 891 (Figure 4B, inset). In contrast, modulation of the frequency of Crz1’s nuclear localization can control the expression of multiple target genes in a more proportional manner and thus maintain more stable ratios of gene expression, regardless of the shapes of their input functions (Figure 4B, main graph). This behavior might be explained by the fact that a strong nonlinear component (i.e., dependence on Crz1 magnitude) is now kept relatively constant for different calcium levels, and the variable part is the amount of time the promoters are exposed to a ﬁxed amount of nuclear Crz1. Oscillations in the level or localization of transcription factors have been observed in diverse environmental responses, such as those involving NF-kB (nuclear factor K-light-chain-enhancer of activated B cells) (Ashall et al., 2009; Covert et al., 2005; Frie- drichsen et al., 2006; Nelson et al., 2004; Tay et al., 2010) and the tumor suppressor p53 (Geva-Zatorsky et al., 2006; Loewer et al., 2010)(Ashall et al., 2009; Friedrichsen et al., 2006; Nelson et al., 2004; Tay et al., 2010) in mammals and the SOS response to DNA damage in bacteria (Friedman et al., 2005). In the p53 and SOS systems, monitoring with high temporal resolution revealed tightly regulated oscillations in the nuclear levels of the key regulators (e.g., p53) with variable amplitude but more precise timing. Oscillations in regulatory proteins, which are driven by external stimuli, often lead to nonoscillatory, impulse transcriptional patterns. For example, the expression of p21, a p53-target gene, is induced in a nonoscillatory manner during DNA damage (Loewer et al., 2010). Similarly, oscillations in NF-kB localization and activity following TNF-a stimulation are coupled to impulse- like patterns in a host of early response genes, such as the NF-kB inhibitor Ik-Ba, even when assessed at the single-cell level Figure 4. Coordinated Impulse Response Generated by Protein (Tay et al., 2010). Oscillators Thus, protein oscillations in environmental response systems (A) In response to extracellular calcium, yeast cells initiate bursts of nuclear may play a general mechanistic role in regulating downstream localization of the transcription factor Crz1. Bottom left: A single-cell time trace of the amount of phosphorylated Crz1 in the nucleus; the arrow indicates impulse transcriptional changes. First, oscillation of transcription introduction of extracellular calcium. Bottom right: The frequency of bursts (y factor levels can maintain a steady response as long as the axis) rises with calcium levels (x axis). Error bars calculated by using different damage signal is present and constitutive supply of the down- thresholds for burst determination (see Cai et al., 2008). Inset: A histogram of stream gene products is needed (as in the p53 response). Second, burst duration times under high (red) and low (blue) calcium levels indicates that burst duration is independent of calcium concentration. oscillations in transcription factor localization can underlie the (B) Expression levels of three synthetic Crz1-dependent promoters increase induction of proportional responses through frequency modula- proportionally to extracellular calcium concentration (x axis). On the y axis, tion (as with Crz1). Finally, combinations of protein oscillators data are divided, for each variant, by the expression at maximum calcium level. The synthetic promoters have 1 (red), 2 (green), or 4 (blue) calcineurin- can generate various transcriptional kinetic patterns. For dependent response elements. Inset: A bar chart showing the fold-change of instance, activation of NF-kB in mouse embryo ﬁbroblasts treated the different targets, following Crz1 overexpression. The targets exhibit with LPS depends on two pathways, MyD88-dependent and different responses, probably due to their different numbers of Crz1-binding MyD88-independent (Covert et al., 2005). Perturbing either one sites. Reproduced from Cai et al. (2008) with permission from the authors. of these pathways and leaving the other one intact leads, in both cases, to oscillatory NF-kB activity. However, when both calcium increases, Crz1 is dephosphorylated and exhibits short pathways are intact, both oscillators act upon LPS stimulation bursts of translocation to the nucleus. At higher levels of calcium, but with a relative phase shift of 30 min, resulting in a stable, non- the cells respond, not by increasing the amount of nuclear Crz1 oscillatory pattern of NF-kB activity. It is likely that other combina- in each translocation burst but rather by increasing the frequency tions, as well as modulation of both amplitude and frequency, will of the bursts (Figure 4A). play a role at encoding other complex patterns of transcriptional Such ‘‘frequency modulation’’ may be important because of regulation at single genes and gene modules. the nonlinearity (Yuh et al., 2001) and diversity (Kim et al., Attenuation and Ordering of Impulse Responses through 2009) of the input functions associated with different target Feedback and Cascades promoters. Because distinct Crz1 target promoters (Figure 4A) Impulse patterns can be attenuated and ordered in more probably respond differently to changing levels of Crz1, ampli- complex programs and through more elaborate regulatory archi- tude modulation of Crz1 would not maintain their relative ratios tectures, most notably within developmental programs. In

892 Cell 144, March 18, 2011 ª2011 Elsevier Inc. particular, in the cascade motif (Figure 2E), regulators are tors, most biological systems intertwine these temporal patterns. ordered in layers, and proteins from one layer control ones in For example, oscillations in protein levels or localization can also subsequent layers (Hooshangi et al., 2005; Rappaport et al., lead to impulse responses, and ordered impulses are important 2005). This pattern was observed in transcriptional networks in generating sustained responses through cascades. Further- during sea urchin development (Bolouri and Davidson, 2003; more, many of the underlying molecular mechanisms driving Davidson, 2009, 2010; Oliveri et al., 2008), state-transitioning these temporal patterns can be intimately linked. For example, systems in microorganisms (Chu et al., 1998), and environmental a gene may be poised for transcription with a preinitiation responses in mammalian cells (Amit et al., 2007a, 2009; Ramsey complex in anticipation of both developmental and environ- et al., 2008; Shapira et al., 2009). A cascade-like network mental stimuli. topology entails an inherent temporal order of regulation events Similarly, the mechanistic regulatory building blocks surveyed (Hooshangi et al., 2005). It was postulated to enable context- here are typically embedded within a wider network context. specific responses (Davidson, 2009) and to provide robustness First, many responses, especially in metazoans, involve a large both to spurious input signals (Hooshangi et al., 2005) and to number of inputs into a single promoter during both environ- noise in the rates of protein production (Rappaport et al., 2005). mental and developmental responses (Amit et al., 2009; Regulatory interactions between different layers in a cascade FANTOM consortium et al., 2009). In addition, transcriptional can form multicomponent loops in which genes in a late tran- cascades are often combined with other motifs, such as negative scriptional wave regulate genes from earlier waves (Figure 2C). feedbacks (Amit et al., 2007a), feedforward loops (Basu et al., The ensuing feedback effect can contribute to the ultimate atten- 2004; Shen-Orr et al., 2002), and single-input modules (Shen- uation of impulse responses, even under a sustained signal (Amit Orr et al., 2002). Such elaborate loops (Figure 2C) and cascades et al., 2007b). For example, stimulation of human cell lines with (Figure 2E) are essential to generate temporal order and stable EGF induces several ordered impulse responses (Amit et al., cell states in developmental systems (Davidson, 2009; Hoosh- 2007a), including the induction of ‘‘delayed early’’ genes. De- angi et al., 2005; Kim et al., 2008; Lee et al., 2002; Li et al., layed early genes are primarily induced by transcription factors 2007; Macarthur et al., 2009; Oliveri et al., 2008; Rappaport that were themselves induced as ‘‘immediate early’’ genes. De- et al., 2005). Furthermore, multiple cis-regulatory elements and layed early genes encode a large number of signaling proteins sequences affecting nucleosome positions are integrated within and RNA-binding proteins that attenuate RNA levels and protein more complex cis-regulatory functions in both yeasts (Gertz activity of the initial response pathways. Such negative tran- et al., 2009; Raveh-Sadka et al., 2009) and metazoans (Kaplan scriptional feedback mediated through a transcriptional cascade et al., 2009; Yuh et al., 2001; Zinzen et al., 2009). is common in environmental responses in yeast as well Both computational studies and synthetic molecular circuits (Segal et al., 2003). (Cantone et al., 2009) have provided additional insights into the A more basic form of feedback is the autoregulatory loop by crosstalk between motifs (Ishihara et al., 2005; Ma et al., 2004) which a transcription factor regulates its own gene. Negative au- and into the dynamics of complex networks (Walczak et al., toregulation (Figure 2A) facilitates a rapid transcriptional 2010) that integrate multiple motifs. Nevertheless, the corre- response of the autoregulating gene. It has been associated spondence between simple subnetworks and motifs and the with the induction of a rapid impulse response to EGF stimulation observed temporal patterns of mRNA levels (Alon, 2007; David- in human cells (Amit et al., 2007a) and to DNA damage in E. coli son, 2009, 2010) suggests a substantial degree of modularity in (Camas et al., 2006). Conversely, the positive autoregulatory the operation of regulatory systems. loop (Figure 2B) is associated with the opposite effect because Most of the mechanisms driving mRNA concentrations it results in a slow response time (Alon, 2007). Positive loops, described in this Review, and that have been deciphered in detail with either one or more components (Figure 2BorFigure 2C, so far, are transcriptional, but other pathways also affect mRNA respectively), can lead to substantial variation between isogenic levels, including mRNA processing, transport, and degradation. cells, due to stochastic effects, and can play an important role in Although recent studies (Shalem et al., 2008) suggest that such maintaining stability after state transitioning (Davidson, 2009; mechanisms can play a substantial role in shaping temporal Kim et al., 2008; Macarthur et al., 2009; Oliveri et al., 2008). profiles of mRNA levels, these mechanisms are still far less understood than transcription regulation. Indeed, the scarcity Perspective of experimental methods to monitor these processes has Diverse mechanisms drive impulse-like changes in mRNA levels, hampered progress in this area. However, we anticipate that which can occur on a broad range of timescales, from rapid envi- recent advances in massively parallel cDNA sequencing (RNA- ronmental stress responses to slower and more elaborate devel- Seq) (Mortazavi et al., 2008) will help advance this front. opmental processes. What can we learn by comparing these More generally, deciphering circuitry and understanding the processes across timescales? The emerging picture supports capacity of molecular mechanisms to encode complex signals a few basic principles. Just-in-time responses and sign sensi- and decode them into specific responses will require tight inte- tivity optimize process efficiency, whereas the organization of gration between experiments, analysis, and computation, in the impulse responses in functional waves and cascades particular for temporal responses. First, there is a substantial provides temporal compartmentalization and order to gene need for direct manipulation of both signals, for example using expression. microfluidic devices, and of individual components, by manipu- Although in this Review, we have made convenient distinc- lation of either trans-components (Amit et al., 2009; Costanzo tions between impulse responses, state transitions, and oscilla- et al., 2010; FANTOM consortium et al., 2009)orcis-sequences

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 893 (Gertz et al., 2009; Patwardhan et al., 2009). Monitoring temporal REFERENCES responses in segregating populations (Eng et al., 2010) can provide a complementary means for testing the effect of many Alon, U. (2007). Network motifs: theory and experimental approaches. Nat. Rev. Genet. 8, 450–461. simultaneous genetic perturbations. Analytical methods and computational models can guide the design of these perturba- Amit, I., Citri, A., Shay, T., Lu, Y., Katz, M., Zhang, F., Tarcic, G., Siwak, D., Lahad, J., Jacob-Hirsch, J., et al. (2007a). A module of negative feedback tions to a search space that is maximally informative and biolog- regulators defines growth factor signaling. Nat. Genet. 39, 503–512. ically relevant. For example, sequence models of gene regulation Amit, I., Wides, R., and Yarden, Y. (2007b). Evolvable signaling networks of (Gertz et al., 2009; Raveh-Sadka et al., 2009) can help investiga- receptor tyrosine kinases: relevance of robustness to malignancy and to tors make relevant promoter variants to test, whereas provisional cancer therapy. Mol. Syst. Biol. 3, 151. trans models of -regulation (Amit et al., 2009) can help narrow Amit, I., Garber, M., Chevrier, N., Leite, A.P., Donner, Y., Eisenhaure, T., down targets for gene silencing or disruption. Guttman, M., Grenier, J.K., Li, W., Zuk, O., et al. (2009). Unbiased reconstruc- Improving the ability to monitor a larger number of circuit tion of a mammalian transcriptional network mediating pathogen responses. components over time in living cells is important for broadening Science 326, 257–263. the scope of single-cell studies and for deepening our under- Ashall, L., Horton, C.A., Nelson, D.E., Paszek, P., Harper, C.V., Sillitoe, K., standing of population-level phenomena observed with geno- Ryan, S., Spiller, D.G., Unitt, J.F., Broomhead, D.S., et al. (2009). Pulsatile mics profiling technologies. Recent advances in simultaneously stimulation determines timing and specificity of NF-kappaB-dependent transcription. Science 324, 242–246. monitoring in vivo multiple types of RNA (Kern et al., 1996; Muz- zey and van Oudenaarden, 2009) or proteins (Bandura et al., Badis, G., Berger, M.F., Philippakis, A.A., Talukder, S., Gehrke, A.R., Jaeger, S.A., Chan, E.T., Metzler, G., Vedenko, A., Chen, X., et al. (2009). Diversity 2009) are promising. Notably, although the difference between and complexity in DNA recognition by transcription factors. Science 324, a single-cell and population view is a recurring theme of recent 1720–1723. studies, reconciling the two is important for a functional under- Bandura, D.R., Baranov, V.I., Ornatsky, O.I., Antonov, A., Kinach, R., Lou, X., standing of a response, especially in multicellular organisms Pavlov, S., Vorobiev, S., Dick, J.E., and Tanner, S.D. (2009). Mass cytometry: (Simon et al., 2005). For example, a recent study of the NF-kB technique for real time single cell multitarget immunoassay based on induc- response to TNF-a stimulation showed that the observed cellular tively coupled plasma time-of-flight mass spectrometry. Anal. Chem. 81, heterogeneity may be optimal for achieving a functional popula- 6813–6822. tion (or mean) response for paracrine cytokine signaling (Paszek Bansal, M., Belcastro, V., Ambesi-Impiombato, A., and di Bernardo, D. (2007). et al., 2010). How to infer gene networks from expression profiles. Mol. Syst. Biol. 3, 78. Computational analysis of time course data presents several Basma, H., Soto-Gutierrez, A., Yannam, G.R., Liu, L., Ito, R., Yamamoto, T., challenging problems. These include, among others, identifying Ellis, E., Carson, S.D., Sato, S., Chen, Y., et al. (2009). Differentiation and trans- differentially expressed genes, grouping them into clusters of plantation of human embryonic stem cell-derived hepatocytes. Gastroenter- ology 136, 990–999. similar temporal patterns, and inferring their regulatory interactions. Recent studies have shown that a useful algo- Basu, S., Mehreja, R., Thiberge, S., Chen, M.T., and Weiss, R. (2004). Spatio- temporal control of gene expression with pulse-generating networks. Proc. rithmic starting point is to derive a continuous representation Natl. Acad. Sci. USA 101, 6355–6360. of transcriptional profiles by fitting to a particular mathematical Bolouri, H., and Davidson, E.H. (2003). Transcriptional regulatory cascades in function (Chechik and Koller, 2009; Storey et al., 2005). Specifi- development: initial rates, not steady state, determine network kinetics. Proc. cally, impulse responses fit well to a certain class of sigmoid Natl. Acad. Sci. USA 100, 9371–9376. ‘‘impulse-like’’ functions, which have a small number of biologi- Braun, E., and Brenner, N. (2004). Transient responses and adaptation to cally interpretable parameters (e.g., onset time) (Chechik and steady state in a eukaryotic gene regulation system. Phys. Biol. 1, 67–76. Koller, 2009; Chechik et al., 2008). The fitted continuous repre- Bruce, A.W., Lopez-Contreras, A.J., Flicek, P., Down, T.A., Dhami, P., Dillon, sentations can then be used in conjunction with the original S.C., Koch, C.M., Langford, C.F., Dunham, I., Andrews, R.M., et al. (2009). expression values, aiming to provide a more robust analysis, Functional diversity for REST (NRSF) is defined by in vivo binding affinity particularly for differential expression (Storey et al., 2005) and hierarchies at the DNA sequence level. Genome Res. 19, 994–1005. clustering (Chechik and Koller, 2009; Chechik et al., 2008). Cai, L., Dalal, C.K., and Elowitz, M.B. (2008). Frequency-modulated nuclear Despite these advances and the vast amount of research on localization bursts coordinate gene regulation. Nature 455, 485–490. the more advanced task of regulatory network inference (Bansal Camas, F.M., Blazquez, J., and Poyatos, J.F. (2006). Autogenous and nonau- et al., 2007; Karlebach and Shamir, 2008), there is still much to be togenous control of response in a genetic network. Proc. Natl. Acad. Sci. USA 103 accomplished. The emerging complexity of regulatory mecha- , 12718–12723. nisms and the expected availability of more diverse and refined Cantone, I., Marucci, L., Iorio, F., Ricci, M.A., Belcastro, V., Bansal, M., Santini, temporal data leave substantial room for developing more S., di Bernardo, M., di Bernardo, D., and Cosma, M.P. (2009). A yeast synthetic network for in vivo assessment of reverse-engineering and modeling refined mechanistic models of gene regulation, which account approaches. Cell 137, 172–181. for both cis and trans elements and their integration in time. Capaldi, A.P., Kaplan, T., Liu, Y., Habib, N., Regev, A., Friedman, N., and Finally, advances in synthetic biology promise the ability not O’Shea, E.K. (2008). Structure and function of a transcriptional network only to manipulate biological entities but also to design systems activated by the MAPK Hog1. Nat. Genet. 40, 1300–1306. to aid the development and interpretation of analytical models Chechik, G., and Koller, D. (2009). Timing of gene expression responses to with increasing complexity. This would be particularly critical to environmental changes. J. Comput. Biol. 16, 279–290. decipher the complex web of interactions and the multiplicity Chechik, G., Oh, E., Rando, O., Weissman, J., Regev, A., and Koller, D. (2008). of inputs that determine temporal changes in gene regulation Activity motifs reveal principles of timing in transcriptional control of the yeast in living cells. metabolic network. Nat. Biotechnol. 26, 1251–1259.

894 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., and Iliopoulos, D., Hirsch, H.A., and Struhl, K. (2009). An epigenetic switch Herskowitz, I. (1998). The transcriptional program of sporulation in budding involving NF-kappaB, Lin28, Let-7 MicroRNA, and IL6 links inflammation to yeast. Science 282, 699–705. cell transformation. Cell 139, 693–706. Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., Ishihara, S., Fujimoto, K., and Shibata, T. (2005). Cross talking of network Ding, H., Koh, J.L., Toufighi, K., Mostafavi, S., et al. (2010). The genetic land- motifs in gene regulation that generates temporal pulses and spatial stripes. scape of a cell. Science 327, 425–431. Genes Cells 10, 1025–1038. Covert, M.W., Leung, T.H., Gaston, J.E., and Baltimore, D. (2005). Achieving Kalir, S., Mangan, S., and Alon, U. (2005). A coherent feed-forward loop with stability of lipopolysaccharide-induced NF-kappaB activation. Science 309, a SUM input function prolongs flagella expression in Escherichia coli. Mol. 1854–1857. Syst. Biol. 1, 0006. Davidson, E.H. (2009). Network design principles from the sea urchin embryo. Kaplan, N., Moore, I.K., Fondufe-Mittendorf, Y., Gossett, A.J., Tillo, D., Field, Curr. Opin. Genet. Dev. 19, 535–540. Y., LeProust, E.M., Hughes, T.R., Lieb, J.D., Widom, J., et al. (2009). The Davidson, E.H. (2010). Emerging properties of animal gene regulatory DNA-encoded nucleosome organization of a eukaryotic genome. Nature 458 networks. Nature 468, 911–920. , 362–366. Deal, R.B., Henikoff, J.G., and Henikoff, S. (2010). Genome-wide kinetics of Kaplan, S., Bren, A., Dekel, E., and Alon, U. (2008). The incoherent feed- nucleosome turnover determined by metabolic labeling of histones. Science forward loop can generate non-monotonic input functions for genes. Mol. 4 328, 1161–1164. Syst. Biol. , 203. Eng, K.H., Kvitek, D.J., Keles, S., and Gasch, A.P. (2010). Transient genotype- Karlebach, G., and Shamir, R. (2008). Modelling and analysis of gene regula- 9 by-environment interactions following environmental shock provide a source tory networks. Nat. Rev. Mol. Cell Biol. , 770–780. of expression variation for essential genes. Genetics 184, 587–593. Kern, D., Collins, M., Fultz, T., Detmer, J., Hamren, S., Peterkin, J.J., Sheridan, FANTOM consortium, Suzuki, H., Forrest, A.R., van Nimwegen, E., Daub, C.O., P., Urdea, M., White, R., Yeghiazarian, T., et al. (1996). An enhanced-sensitivity Balwierz, P.J., Irvine, K.M., Lassmann, T., Ravasi, T., Hasegawa, Y., et al. branched-DNA assay for quantification of human immunodeficiency virus type 34 (2009). The transcriptional network that controls growth arrest and differentia- 1 RNA in plasma. J. Clin. Microbiol. , 3196–3202. tion in a human myeloid leukemia cell line. Nat. Genet. 41, 553–562. Kim, H.D., Shay, T., O’Shea, E.K., and Regev, A. (2009). Transcriptional regu- 325 Foster, S.L., Hargreaves, D.C., and Medzhitov, R. (2007). Gene-specific latory circuits: predicting numbers from alphabets. Science , 429–432. control of inflammation by TLR-induced chromatin modifications. Nature Kim, J., Chu, J., Shen, X., Wang, J., and Orkin, S.H. (2008). An extended tran- 447, 972–978. scriptional network for pluripotency of embryonic stem cells. Cell 132, Friedman, N., Vardi, S., Ronen, M., Alon, U., and Stavans, J. (2005). Precise 1049–1061. temporal modulation in the response of the SOS DNA repair network in indi- Kittisopikul, M., and Suel, G.M. (2010). Biological role of noise encoded in vidual bacteria. PLoS Biol. 3, e238. a genetic network motif. Proc. Natl. Acad. Sci. USA 107, 13300–13305. Friedrichsen, S., Harper, C.V., Semprini, S., Wilding, M., Adamson, A.D., Kultz, D. (2005). Molecular and evolutionary basis of the cellular stress Spiller, D.G., Nelson, G., Mullins, J.J., White, M.R., and Davis, J.R. (2006). response. Annu. Rev. Physiol. 67, 225–257. Tumor necrosis factor-alpha activates the human prolactin gene promoter Lachmann, A., Xu, H., Krishnan, J., Berger, S.I., Mazloom, A.R., and Ma’ayan, 147 via nuclear factor-kappaB signaling. Endocrinology , 773–781. A. (2010). ChEA: transcription factor regulation inferred from integrating Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, genome-wide ChIP-X experiments. Bioinformatics 26, 2438–2444. G., Botstein, D., and Brown, P.O. (2000). Genomic expression programs in the Lahav, G., Rosenfeld, N., Sigal, A., Geva-Zatorsky, N., Levine, A.J., Elowitz, 11 response of yeast cells to environmental changes. Mol. Biol. Cell , M.B., and Alon, U. (2004). Dynamics of the p53-Mdm2 feedback loop in indi- 4241–4257. vidual cells. Nat. Genet. 36, 147–150. Gertz, J., Siggia, E.D., and Cohen, B.A. (2009). Analysis of combinatorial cis- Lam, F.H., Steger, D.J., and O’Shea, E.K. (2008). Chromatin decouples 457 regulation in synthetic and genomic promoters. Nature , 215–218. promoter threshold from dynamic range. Nature 453, 246–250. Geva-Zatorsky, N., Rosenfeld, N., Itzkovitz, S., Milo, R., Sigal, A., Dekel, E., Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Gerber, G.K., Yarnitzky, T., Liron, Y., Polak, P., Lahav, G., et al. (2006). Oscillations and vari- Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon, I., et al. (2002). Tran- 2 ability in the p53 system. Mol. Syst. Biol. , 0033. scriptional regulatory networks in Saccharomyces cerevisiae. Science 298, Gilchrist, M., Thorsson, V., Li, B., Rust, A.G., Korb, M., Roach, J.C., Kennedy, 799–804. K., Hai, T., Bolouri, H., and Aderem, A. (2006). Systems biology approaches Li, J., Liu, Z.J., Pan, Y.C., Liu, Q., Fu, X., Cooper, N.G., Li, Y., Qiu, M., and Shi, 441 identify ATF3 as a negative regulator of Toll-like receptor 4. Nature , T. (2007). Regulatory module network of basic/helix-loop-helix transcription 173–178. factors in mouse brain. Genome Biol. 8, R244. Goentoro, L., Shoval, O., Kirschner, M.W., and Alon, U. (2009). The incoherent Litvak, V., Ramsey, S.A., Rust, A.G., Zak, D.E., Kennedy, K.A., Lampano, A.E., feedforward loop can provide fold-change detection in gene regulation. Mol. Nykter, M., Shmulevich, I., and Aderem, A. (2009). Function of C/EBPdelta in 36 Cell , 894–899. a regulatory circuit that discriminates between transient and persistent Hager, G.L., McNally, J.G., and Misteli, T. (2009). Transcription dynamics. Mol. TLR4-induced signals. Nat. Immunol. 10, 437–443. 35 Cell , 741–753. Locke, J.C., and Elowitz, M.B. (2009). Using movies to analyse gene circuit Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, dynamics in single cells. Nat. Rev. Microbiol. 7, 383–392. T.W., Hannett, N.M., Tagne, J.B., Reynolds, D.B., Yoo, J., et al. (2004). Tran- Loewer, A., Batchelor, E., Gaglia, G., and Lahav, G. (2010). Basal dynamics of 431 scriptional regulatory code of a eukaryotic genome. Nature , 99–104. p53 reveal transcriptionally attenuated pulses in cycling cells. Cell 142, Hooshangi, S., Thiberge, S., and Weiss, R. (2005). Ultrasensitivity and noise 89–100. propagation in a synthetic transcriptional cascade. Proc. Natl. Acad. Sci. Lopez-Maury, L., Marguerat, S., and Bahler, J. (2008). Tuning gene expression 102 USA , 3581–3586. to changing environments: from rapid responses to evolutionary adaptation. Hu, Z., Killion, P.J., and Iyer, V.R. (2007). Genetic reconstruction of a functional Nat. Rev. Genet. 9, 583–593. 39 transcriptional regulatory network. Nat. Genet. , 683–687. Ma, H.W., Kumar, B., Ditges, U., Gunzer, F., Buer, J., and Zeng, A.P. (2004). An Ihmels, J., Levy, R., and Barkai, N. (2004). Principles of transcriptional control extended transcriptional regulatory network of Escherichia coli and analysis of in the metabolic network of Saccharomyces cerevisiae. Nat. Biotechnol. 22, its hierarchical structure and network motifs. Nucleic Acids Res. 32, 86–92. 6643–6649.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 895 Macarthur, B.D., Ma’ayan, A., and Lemischka, I.R. (2009). Systems biology of by opposing effects of mRNA production and degradation. Mol. Syst. Biol. stem cell fate and cellular reprogramming. Nat. Rev. Mol. Cell Biol. 10, 4, 223. 672–681. Shapira, S.D., Gat-Viks, I., Shum, B.O., Dricot, A., de Grace, M.M., Wu, L., Macia, J., Widder, S., and Sole, R. (2009). Specialized or flexible feed-forward Gupta, P.B., Hao, T., Silver, S.J., Root, D.E., et al. (2009). A physical and regu- loop motifs: a question of topology. BMC Syst. Biol. 3, 84. latory map of host-influenza interactions reveals pathways in H1N1 infection. Mangan, S., Zaslaver, A., and Alon, U. (2003). The coherent feedforward loop Cell 139, 1255–1267. serves as a sign-sensitive delay element in transcription networks. J. Mol. Biol. Shen-Orr, S.S., Milo, R., Mangan, S., and Alon, U. (2002). Network motifs in the 334, 197–204. transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68. Mangan, S., Itzkovitz, S., Zaslaver, A., and Alon, U. (2006). The incoherent Shivaswamy, S., Bhinge, A., Zhao, Y., Jones, S., Hirst, M., and Iyer, V.R. feed-forward loop accelerates the response-time of the gal system of Escher- (2008). Dynamic remodeling of individual nucleosomes across a eukaryotic ichia coli. J. Mol. Biol. 356, 1073–1081. genome in response to transcriptional perturbation. PLoS Biol. 6, e65. Mitchell, A., Romano, G.H., Groisman, B., Yona, A., Dekel, E., Kupiec, M., Da- Simon, I., Siegfried, Z., Ernst, J., and Bar-Joseph, Z. (2005). Combined static han, O., and Pilpel, Y. (2009). Adaptive prediction of environmental changes by and dynamic analysis for determining the quality of time-series expression microorganisms. Nature 460, 220–224. profiles. Nat. Biotechnol. 23, 1503–1508. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., and Wold, B. (2008). Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Brown, P.O., Botstein, D., and Futcher, B. (1998). Comprehensive identifica- Methods 5, 621–628. tion of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by Murphy, L.O., Smith, S., Chen, R.H., Fingar, D.C., and Blenis, J. (2002). Molec- microarray hybridization. Mol. Biol. Cell 9, 3273–3297. ular interpretation of ERK signal duration by immediate early gene products. Springer, M., Wykoff, D.D., Miller, N., and O’Shea, E.K. (2003). Partially phos- Nat. Cell Biol. 4, 556–564. phorylated Pho4 activates transcription of a subset of phosphate-responsive Murray, J.I., Whitfield, M.L., Trinklein, N.D., Myers, R.M., Brown, P.O., and genes. PLoS Biol. 1, E28. Botstein, D. (2004). Diverse and specific gene expression responses to Storey, J.D., Xiao, W., Leek, J.T., Tompkins, R.G., and Davis, R.W. (2005). stresses in cultured human cells. Mol. Biol. Cell 15, 2361–2374. Significance analysis of time course microarray experiments. Proc. Natl. Muzzey, D., and van Oudenaarden, A. (2009). Quantitative time-lapse fluores- Acad. Sci. USA 102, 12837–12842. cence microscopy in single cells. Annu. Rev. Cell Dev. Biol. 25, 301–327. Szita, N., Polizzi, K., Jaccard, N., and Baganz, F. (2010). Microfluidic Nachman, I., Regev, A., and Ramanathan, S. (2007). Dissecting timing vari- approaches for systems and synthetic biology. Curr. Opin. Biotechnol. 21, ability in yeast meiosis. Cell 131, 544–556. 517–523. Nelson, D.E., Ihekwaba, A.E., Elliott, M., Johnson, J.R., Gibney, C.A., Fore- Tagkopoulos, I., Liu, Y.C., and Tavazoie, S. (2008). Predictive behavior within man, B.E., Nelson, G., See, V., Horton, C.A., Spiller, D.G., et al. (2004). Oscil- microbial genetic networks. Science 320, 1313–1317. lations in NF-kappaB signaling control the dynamics of gene expression. Tanay, A. (2006). Extensive low-affinity transcriptional interactions in the yeast Science 306, 704–708. genome. Genome Res. 16, 962–972. Oliveri, P., Tu, Q., and Davidson, E.H. (2008). Global regulatory logic for spec- Tay, S., Hughey, J.J., Lee, T.K., Lipniacki, T., Quake, S.R., and Covert, M.W. ification of an embryonic cell lineage. Proc. Natl. Acad. Sci. USA 105, (2010). Single-cell NF-kappaB dynamics reveal digital activation and analogue 5955–5962. information processing. Nature 466, 267–271. Paszek, P., Ryan, S., Ashall, L., Sillitoe, K., Harper, C.V., Spiller, D.G., Rand, Tyson, J.J., Chen, K.C., and Novak, B. (2003). Sniffers, buzzers, toggles and D.A., and White, M.R. (2010). Population robustness arising from cellular blinkers: dynamics of regulatory and signaling pathways in the cell. Curr. heterogeneity. Proc. Natl. Acad. Sci. USA 107, 11644–11649. Opin. Cell Biol. 15, 221–231. Patwardhan, R.P., Lee, C., Litvin, O., Young, D.L., Pe’er, D., and Shendure, J. Walczak, A.M., Tkacik, G., and Bialek, W. (2010). Optimizing information flow (2009). High-resolution analysis of DNA regulatory elements by synthetic satu- in small genetic networks. II. Feed-forward interactions. Phys. Rev. E Stat. ration mutagenesis. Nat. Biotechnol. 27, 1173–1175. Nonlin. Soft Matter Phys. 81, 041905. Ramsey, S.A., Klemm, S.L., Zak, D.E., Kennedy, K.A., Thorsson, V., Li, B., Gil- Wei, G., Wei, L., Zhu, J., Zang, C., Hu-Li, J., Yao, Z., Cui, K., Kanno, Y., Roh, christ, M., Gold, E.S., Johnson, C.D., Litvak, V., et al. (2008). Uncovering T.Y., Watford, W.T., et al. (2009). Global mapping of H3K4me3 and H3K27me3 a macrophage transcriptional program by integrating evidence from motif reveals specificity and plasticity in lineage fate determination of differentiating scanning and expression dynamics. PLoS Comput. Biol. 4, e1000021. CD4+ T cells. Immunity 30, 155–167. Rappaport, N., Winter, S., and Barkai, N. (2005). The ups and downs of biolog- Whitehouse, I., Rando, O.J., Delrow, J., and Tsukiyama, T. (2007). Chromatin ical timers. Theor. Biol. Med. Model. 2, 22. remodelling at promoters suppresses antisense transcription. Nature 450, Raveh-Sadka, T., Levo, M., and Segal, E. (2009). Incorporating nucleosomes 1031–1035. into thermodynamic models of transcription regulation. Genome Res. 19, Wilkinson, D.J. (2009). Stochastic modelling for quantitative description of 1480–1496. heterogeneous biological systems. Nat. Rev. Genet. 10, 122–133. Rosenfeld, N., Elowitz, M.B., and Alon, U. (2002). Negative autoregulation Yuh, C.H., Bolouri, H., and Davidson, E.H. (2001). Cis-regulatory logic in the speeds the response times of transcription networks. J. Mol. Biol. 323, endo16 gene: switching from a specification to a differentiation mode of 785–793. control. Development 128, 617–629. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and Fried- Zaslaver, A., Mayo, A.E., Rosenberg, R., Bashkin, P., Sberro, H., Tsalyuk, M., man, N. (2003). Module networks: identifying regulatory modules and their Surette, M.G., and Alon, U. (2004). Just-in-time transcription program in meta- condition-specific regulators from gene expression data. Nat. Genet. 34, bolic pathways. Nat. Genet. 36, 486–491. 166–176. Zinzen, R.P., Girardot, C., Gagneur, J., Braun, M., and Furlong, E.E. (2009). Shalem, O., Dahan, O., Levo, M., Martinez, M.R., Furman, I., Segal, E., and Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature Pilpel, Y. (2008). Transient transcriptional responses to stress are generated 462, 65–70.

896 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Leading Edge Review

Signaling from the Living Plasma Membrane

Herna´ n E. Grecco,1 Malte Schmick,1 and Philippe I.H. Bastiaens1,2,* 1Max Planck Institute for Molecular Physiology, Department of Systemic Cell Biology, Otto-Hahn-Str. 11, D-44227 Dortmund, Germany 2Chemical Biology Department, Faculty of Chemistry, TU-Dortmund, D-44227 Dortmund, Germany *Correspondence: [email protected] DOI 10.1016/j.cell.2011.01.029

Our understanding of the plasma membrane, once viewed simply as a static barrier, has been revolutionized to encompass a complex, dynamic organelle that integrates the cell with its extracellular environment. Here, we discuss how bidirectional signaling across the plasma membrane is achieved by striking a delicate balance between restriction and propagation of information over different scales of time and space and how underlying dynamic mechanisms give rise to rich, context-dependent signaling responses. In this Review, we show how computer simulations can generate counterintuitive predictions about the spatial organization of these complex processes.

The Plasma Membrane: A Dynamic Barrier water-lipid system and therefore occurs spontaneously. This Biological systems operate within a carefully tailored balance of property of self-assembly provided a convenient evolutionary opposing tendencies, favoring one or the other in response to path for the generation of a relatively stable supramolecular internal and external cues. Such duality between robustness structure that shields its contents from the dissipative effects and adaptability or between exploration of possibilities and of diffusion (Griffiths, 2007). However, the plasma membrane commitment to a decision, for example, permeates every level of the modern cell is not a static, self-assembled system but is of organization. The function of the plasma membrane is an continuously renewed to preserve its nonequilibrium state. For excellent example of this duality, as it defines the cell by isolating example, its lipid composition is dynamically maintained by it from the extracellular environment while at the same time inte- a combination of lipid synthesis and chemical conversion, vesic- grating the cell with its surroundings by transferring messenger ular fusion and fission events that tie into intracellular transport molecules or initiating reaction cascades within it. Isolation and sorting processes (van Meer et al., 2008). The lipids, which versus communication is therefore the precarious balance that were previously thought to serve only a structural function, are the plasma membrane must continuously maintain, separating themselves subject to chemical modification and can thereby the outside from the inside while presenting each a representa- relay signals. The resulting axial and lateral asymmetry of the tion of the other. To the inside, the plasma membrane summa- membrane can be rapidly modulated to allow for bidirectional rizes the cell’s ‘‘social’’ context while projecting the cell’s state information transfer. to the outside. For this reason, plasma membrane function is Lipids also provide a fluid matrix in which proteins reside and fundamental not only to keep a single cell alive, but also to main- diffuse laterally (Zimmerberg and Gawrisch, 2006). These tain its proper behavior in the organismal collective. membrane proteins, which represent more than 50% of the The plasma membrane is composed of a bilayer of lipids and cross-sectional area of the membrane in some cell types (Jan- incorporated proteins, whose interactions as an ensemble mey and Kinnunen, 2006), provide the machinery for most of enable it to receive, remember, process, and relay information the plasma membrane’s dynamic properties. In addition to along and across it. These interactions form a signal transduc- structural and sensory functions, they mediate matter exchange tion hierarchy of interconnected time- and lengthscales bridging with the environment, enabling the membrane to actively and more than three orders of magnitude, from nanometer-sized passively regulate transport of substances across it, even proteins to the micrometer scale of the cell. Within each level against a concentration gradient. This, for example, can of the hierarchy, the lengthscales (how far information will generate ion gradients across the membrane that have impor- spread) are coupled with the timescales (how fast information tant physiological functions such as water homeostasis and will spread) through underlying physical-chemical mechanisms electrical excitability. such as free diffusion, reaction-diffusion, or active transport. Experimental work with model lipid membranes has neverthe- In its most primitive state, the plasma membrane forms less been one of the main sources of quantitative information a spherical shell, 5 nm thick, and is permeable only to small about the physical properties of lipid bilayers (Janmey and Kin- nonpolar molecules such as oxygen and nitrogen. In a water- nunen, 2006). Biophysical parameters such as rigidity, tension, based environment, lipids shield their hydrophobic tails from spontaneous curvature, and elastic moduli have thus been the surrounding polar fluid, exposing their more hydrophilic determined as a function of temperature, hydration, and lipid heads. This arrangement minimizes the free energy of the composition. Such model membranes clearly lack the dynamic

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 897 features of their real-life equivalents. For example, lipid compo- basolateral domains maintain their different lipid and protein sition is generally symmetrical in model membranes (Devaux and compositions dynamically through the life cycle of the cell Morris, 2004), and phenomena such as membrane coupling to (Muth and Caplan, 2003; Simons and van Meer, 1988). The living the cytoskeleton (Kwik et al., 2003) are difficult to reproduce plasma membrane, together with PAR proteins, forms a self- in vitro. referencing system that establishes polarity by mechanochemi- Almost 40 years have passed since Singer and Nicholson cally restructuring the cell. wrote their seminal work detailing the fluid mosaic model of Though such a coarse-grained partitioning provides the cell the plasma membrane (Singer and Nicolson, 1972). In this with a stable polarized structure on a timescale of days, it does work, which elegantly integrated the experimental and theoret- not supply a sufficiently rapid response mechanism that is appro- ical knowledge of the time, the authors stated: priate for localized cues such as those that occur during chemotaxis, for which a short-term, fine-grained spatial memory is Biological membranes play a crucial role in almost all needed. In model membranes, protein diffusion is fast enough cellular phenomena, yet our understanding of the molec- (D z5 mm2/s) to equilibrate across microns within seconds (Ram- ular organization of membranes is still rudimentary. Expe- adurai et al., 2010). Under these conditions, localized signals rience has taught us, however, that in order to achieve such as activated receptors would redistribute across an area a satisfactory understanding of how any biological system equivalent to the cell surface in a few minutes. In such homoge- functions, the detailed molecular composition and struc- neous membrane systems, the diffusion coefficient scales loga- ture of that system must be known. rithmically with the inverse of the diffusant radius (Saffman and In spite of the enormous amount of knowledge about the Delbru¨ ck, 1975). This implies that slowing down diffusion through structure and composition of the plasma membrane gathered oligomerization of receptors is not enough to constrain mobility. in recent decades, our understanding of it can still be considered For example, an oligomer of a hundred monomers will diffuse at ‘‘rudimentary’’ in light of the complexity of its dynamics that have only half the speed of a monomer. Therefore, diffusion of proteins become apparent since then. A major challenge will therefore be must be contained in order to maintain spatial memory with to animate our rather static view of the plasma membrane by micrometer precision over a timescale of minutes. bringing our model membrane systems to life in the test tube. Single-molecule experiments have shown that the plasma Here, we will discuss the impact that the dynamic, ‘‘living’’ membrane is partitioned into 50–300 nm wide domains by the membrane has on cellular information processing. From the combined action of actin-based membrane skeleton ‘‘fences’’ extensive range of research available, we focus on examples and anchored-transmembrane protein ‘‘pickets’’ (Kusumi and that represent canonical mechanisms to constrain information Sako, 1996). Within these membrane domains, proteins and within the cell, relying on the plasma membrane as a dynamically lipids are highly mobile, with nanoscopic diffusion coefficients maintained supramolecular structure. in the order of those measured for model membranes (Kusumi et al., 2005). The hopping of signaling proteins across the fences Signaling across a Dynamic Barrier: The Lateral occurs with low probability and thereby becomes the rate- Dimension limiting factor in lateral information transfer. In contrast to diffu- The bidirectional transduction of signals by the plasma sion, the hopping rate is strongly dependent on the size of membrane is modulated by its state, and the cell’s historical proteins, and therefore ligand-induced oligomerization of acti- context therefore determines its response to incoming signals. vated receptors traps the signal within these domains (Nelson However, incoming signals also modify the state of the et al., 1999). Such oligomerization-induced trapping thus membrane, and in this way, the transducing medium becomes provides a mechanism for the maintenance of spatial memory. the message. How fast and to what extent this signal is propa- Conversely, the confinement of monomers due to their low gated across the membrane must be tightly regulated to allow hopping rate facilitates oligomerization within domains. These information to spread on a scale that is relevant to the biological domains can be considered as well-mixed protein reaction process while preventing spurious responses. This requires vessels because the time needed to diffuse through them is a balance between responding to an actual signal and resisting two orders of magnitude smaller (150 ms) than the residence spurious events induced by noise, a feat that is achieved by par- time within them (15 ms) (Kusumi et al., 2005). Compartmentali- titioning the plasma membrane into domains that span several zation therefore increases the rate of interaction between recep- time- and lengthscales, corresponding to the dimensions at tors. This has important implications for proteins such as the which the biological processes operate. epidermal growth factor receptor (EGFR), which can oligomerize The largest partitions of the plasma membrane occur at even in the absence of ligand. In cells expressing moderate a micrometer scale. For example, the partitioning of epithelial numbers of EGFR molecules such as BAF/3 or COS7 (5 3 104 cells into apical and basolateral domains generates a cellular receptors/cell), the number of receptors per membrane domain, polarity that enables transcytotic vectorial transport between and therefore the degree of receptor clustering, will be low (2–3 two distinct extracellular environments (Mellman and Nelson, receptors per cluster, consistent with the observations of 2008; Rodriguez-Boulan et al., 2005; van der Wouden et al., Clayton et al. [2005]). However, in cancer cell lines that express 2003). This polarity is established and maintained from yeast to abnormally high levels of EGFR such as A431 (2 3 106 receptors/ mammals by the PAR (partitioning defective)/protein kinase C cell), the average size of transient clusters will be much higher system that amplifies a RHO family G protein (CDC42)-mediated (10–15 receptors per cluster, consistent with the observation of polarity cue (Suzuki and Ohno, 2006). The resulting apical and Zidovetzki et al. [1981])(Figure 1A). These preformed clusters

898 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 1. Mechanisms Regulating Lateral Signaling across the Plasma Membrane (A) The reactivity of the plasma membrane is modulated by spatial constraints. Schematic depiction of receptor tyrosine kinase density in cytoskeleton-mediated membrane domains (top) and corresponding distributions (bottom) for different amounts of receptor per cell (N). (B) A bistable system generated by a receptor tyrosine kinase that inhibits its own inhibitory protein tyrosine phosphatase. Two-dimensional time evolution of membrane-bound receptor activation after an initial local point activation shows spreading of the signal. (C) A Turing system generated by a receptor tyrosine kinase that activates its own inhibitory protein tyrosine phosphatase. Two-dimensional time evolution after global stimulation shows the generation of kinase activity hot spots in the plasma membrane. have a profound impact on the propagation of receptor signals, stood. Binding of the cognate ligand to a receptor promotes their as they spatially modulate the basal activity and reactivity of the dimerization and thereby enables their phosphorylation in trans plasma membrane. via their intrinsic tyrosine kinase activity (Lemmon and Schles- On a mechanistic level, the transmission of signals by receptor singer, 2010). The resulting phosphorylated tyrosine residues, tyrosine kinases (RTKs) such as EGFR is relatively well under- exposed to the cytoplasm, act as docking sites for proteins

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 899 that contain specialized domains, such as SH2 or PTB domains feedback loop, which together with the autocatalytic kinase (Lim and Pawson, 2010). Their recruitment induces allosteric activity of the receptor, results in a bistable system (Reynolds changes in enzymatic activity or binding affinity on another et al., 2003; Tischer and Bastiaens, 2003). This reaction network module of the docked molecule, conveying signals deeper into effectively operates in a nanoenvironment that is local to the the cytoplasm (Deribe et al., 2010). However, though these activated receptor due to the short half-life of intracellular sequences of reaction events provide insight into how signals ROS, which are the target of very efficient antioxidant enzymes are transferred across the membrane into the cell, they do not such as peroxiredoxin I (PRXI) (Woo et al., 2010). Although provide information on how these signals are regulated in space spatially constrained, the presence of ROS can still lower and time. To achieve this, we must consider the collective the excitation threshold of neighboring, inactive receptors. behavior of the ensemble of signaling molecules in the plasma Receptor density then becomes the key to trigger a domino- membrane that have an influence on receptor phosphorylation like rapid propagation of activity at long range, whereby the state. Even at a low hopping rate of clustered receptors, the RTK/PTP/H2O2 system acts as an excitable medium (Figure 1B) basal kinase activity of RTKs will eventually result in their full (Reynolds et al., 2003). phosphorylation in the absence of a countering phosphatase This global activation initiated by a local source is only activity (Reynolds et al., 2003). The degree of receptor phosphor- possible due to the tight coupling between reaction components ylation within the plasma membrane is therefore determined by that have opposing activities. Insight into the nature of such a continuous cycle of phosphorylation and dephosphorylation. coupling is required to predict the spatial outcome of reaction Growth factor binding increases the amount of phosphorylated diffusion systems. Consider the case of activated, phosphory- receptors by shifting the kinase-phosphatase balance in favor lated RTKs that can activate their own inhibitors such as the of the kinase. PTP SHP1. Here, phosphotyrosines on the activated RTK bind Though membrane-tethered and cytosolic proteins that are SHP1 via its SH2 domain, which allosterically activates the phos- activated by receptors but not confined to the domains might phatase. Phosphorylation of SHP1 by the RTK then locks it into propagate signals, their rather slow microscopic diffusion is the active state, irrespective of binding to the RTK (Frank et al., incompatible with the timescales of minutes observed for such 2004; Uchida et al., 1994). In spite of the high similarity with phenomena (Tischer and Bastiaens, 2003). Small molecule the excitable media network structure discussed above, the second messengers, such as calcium or reactive oxygen spatial outcome is exactly the opposite, as it focuses global acti- species (ROS), like hydrogen peroxide, have much larger diffu- vation into local hot spots (Figure 1C). The theoretical basis for sion coefficients, thereby propagating information via diffusion the emergence of such large-scale patterns from random local more quickly. Previously seen as a reaction by-product that fluctuations was proposed by Turing in 1952 (Turing, 1952). causes oxidative stress, hydrogen peroxide has gained In addition to cell-wide and submicroscopic domains with life- increasing interest as a mediator in signaling (Rhee, 2006). times ranging from days to seconds, even smaller (nanometer Hydrogen peroxide is produced from the dismutation of scale), shorter-lived (subsecond) domains have been proposed superoxide generated by enzyme systems such as NAPDH to transiently confine membrane proteins (Simons and Ikonen, oxidase (NOX) (Brown and Griendling, 2009). Seven NOX cata- 1997). These high-viscosity patches composed of cholesterol lytic subunits have been identified (NOX1, NOX2, NOX3, and glycosphingolipid are known as lipid rafts and have been NOX4, NOX5, DUOX1, and DUOX2) that generate superoxide shown to have an important role as labile platforms to which by transferring an electron from NADPH to molecular oxygen. signaling components are recruited, favoring their interaction The best-characterized NADPH oxidase, phagocytic NOX2, is (Harding and Hancock, 2008). We refer the reader to some excel- a multisubunit enzyme complex with both transmembrane and lent recent reviews for more information about this extensive cytosolic components. Upon stimulation, the cytosolic subunits topic (Lingwood and Simons, 2010) that goes beyond the scope are translocated to the membrane to bind the membrane- of this Review. associated components, leading to activation of the NADPH oxidase complex. This activation process is triggered by growth Signaling across a Dynamic Barrier: The Axial factor receptor activation through the phosphorylation of two Dimension cytoplasmic subunits, P47PHOX and P67PHOX, and the conver- Axial signal propagation into the cytoplasm by phosphorylation sion of GDP-bound RAC1 into GTP-bound forms through the of soluble substrates, like lateral signal propagation, is also activation of a RAC guanine nucleotide exchange factor (GEF) tightly controlled by reaction-diffusion systems that generate (Finkel, 2006). RAC GEFs such as bPIX are recruited via their a local environment of activated substrates. For example, trans- pleckstrin homology domain, and the resulting increase in RAC fer of growth factor signals from RTKs in the plasma membrane activity is presumed to stimulate NOX directly (Finkel, 2006). to soluble substrates in the cytoplasm also depends on cyclic Importantly, NOX enzymes produce superoxide on the outer reaction-diffusion systems of opposing tyrosine kinase/phos- leaflet of the plasma membrane, after which it dismutates to phatase activities. However, the catalytic activity of fully active hydrogen peroxide and diffuses back into the cell (Rhee, 2006). PTPs is up to three orders of magnitude higher than that of tyro- Hydrogen peroxide has been shown to inactivate PTPs such sine kinases (Fischer et al., 1991), which would preclude the as PTP1B by reversible oxidation of a reactive cysteine in the effective transfer of growth factor signals via phosphorylation catalytic cleft (Janssen-Heininger et al., 2008). The hydrogen in the cytoplasm. On the other hand, the absence of PTP activity peroxide-mediated coupling of RTK activation with PTP inhibi- near the plasma membrane would allow spurious signals to be tion (Lee et al., 1998) therefore exemplifies a double-negative transmitted in the cell. The solution to this dilemma is the

900 Cell 144, March 18, 2011 ª2011 Elsevier Inc. membrane-proximal, partial inactivation of PTPs by oxidation of the catalytic cysteine with hydrogen peroxide that is produced by NOX as outlined above. The reducing activity of the cytoplasm (sink) together with the source of hydrogen peroxide production at the plasma membrane generates a hydrogen peroxide gradient in the cytoplasm in which PTP activity is strongly reduced near the membrane. Thus, signal penetration via tyrosine phosphorylation is ultimately a self-referencing system in which tyrosine phosphorylation depends on the magnitude of the hydrogen peroxide gradient, which in turn depends on the balance between RTK and PTP activities. The extent of feedback in this system became even more apparent with the recent identification of PRXI as a major reducing agent that controls hydrogen peroxide levels in the cytoplasm (Woo et al., 2010). Importantly, its activity is inhibited by phosphorylation mediated by membrane-bound Src on tyrosine 194, thereby generating a local positive feedback loop around activated RTKs. We might therefore expect that the resulting, more extended downregulation of PTP activity in the cytoplasm allows more efficient penetration of signals via soluble phosphorylated substrates of the RTKs into the cytoplasm. In order to verify this intuition, we performed cellular automata simulations (Markus et al., 1999) of this reaction diffusion system using realistic parameters (Figure 2). In this simulation, we tracked the spatial and temporal evolution of the reaction of the RTK network described above following a ligand-binding event. The outcome of the simulation showed that coupling of PRXI activity to RTK activity has only a marginal effect on signal penetration in the cytoplasm. The surprise was that the major effect of this coupling was on the excitability of the receptor system in the membrane, in that the excitation threshold is lowered. This demonstrates one of the underestimated values of simulations, namely to guide our sometimes faulty intuitions about dynamic processes. Signaling is often perceived as the linear transfer of information from the plasma membrane to the nucleus, where it regulates the global state of the cell by modulating gene expression. However, signaling from the plasma membrane can also generate actively maintained local cytoplasmic states that can act as morphogenetic cues through their effect on the cytoskeleton and membranes (Dehmelt and Bastiaens, 2010). How can these local cytoplasmic states be generated? From a biochemical perspective, transfer of information in the cytoplasm mostly occurs via reversible enzymatic posttranslational modification or

Figure 2. H2O2-Dependent Regulation of Signal Penetration in the Cytoplasm from the extracellular domain (black). (Third row) Lateral membrane receptor (A) Schematic representation of a double-negative kinase phosphatase phosphorylation levels. Coloring indicates the propagation in time of the high feedback system that regulates substrate phosphorylation. Receptor tyrosine levels of phosphorylated receptor across the membrane. The penetration of kinases (RTKs), their phosphatases (PTPs), and their substrates (S), such as phosphorylated substrate Sp is only marginally affected when PRXI is regu- PRXI, are present in two states. Conversion between states is denoted by lated. However, the lateral speed of membrane signal propagation is increased curved arrows; mediation of conversion, straight arrows. Phosphorylated due to increased receptor reactivity. Parameters were chosen in accordance species are denoted by subscript p; active species, subscript a; inactive, with the literature, as available, and to preserve bistability of activation of RTK 2 subscript i. In case of PRXI, the phosphorylated state is the inactive state. in the membrane (Reynolds et al., 2003). Specifically: Dreceptor = 0.2 mm /s; 2 2 À1 À1 Phosphorylation of the receptor mediates extracellular H2O2 production via Dsubstrate =10mm /s; DROS = 100 mm /s; kPTP =25s ;kRTK =1s ;kNOX = À1 À1 À1 recruited NOX, and active PRXI reduces H2O2. 10 s ;kPRXIa = 100 s ;kROS =5s . (B) Computer simulation of diffusion and reaction of substances as detailed in (C) Axial concentration profile of Sp,H2O2, PTPp, and PRXIp. Blue curves (A) with a cellular automaton approach. Simulations were performed in the depict regulation of PRXI by RTK/PTP; green curves are in absence of PRXI presence (left) or absence (right) of PRXI regulation. Lateral and axial regulation, as in (B); red curves are in absence of PTP inactivation via H2O2 to concentration profile 50 ms (first row) and 170 ms (second row) after ligand demonstrate the high impact of PTP regulation on the extent of substrate binding to a receptor tyrosine kinase at the membrane, separating the cytosol phosphorylation.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 901 induced conformational changes of soluble proteins (Deribe netic) signals. The latter can be achieved by scaffold proteins et al., 2010). Both types of changes are generally counteracted recruiting source activities directly to the plasma membrane, by spontaneous or enzymatic reversion to the original state while thereby locally constraining signaling. For example, by associ- diffusion spreads the ‘‘activity’’ from its source. This can lead to ating the source kinase MEK with its substrate ERK, the scaffold the formation of activity gradients if the source of information KSR1 maintains a primary gradient of double-phosphorylated transfer is localized to a supramolecular structure that is sur- ERK proximal to the plasma membrane, just as STE5 does in rounded by a sink where the reverse reaction occurs. For yeast. How then is the duality between local and global signaling example, as discussed above, a plasma membrane-bound resolved? By taking into account that scaffolds usually have RTK as a source, together with a soluble cytosolic PTP as a lower concentration in the cell than their clients (Lee et al., a sink, can form a cyclic reaction by acting on a phosphorylat- 2003; Maeder et al., 2007), it becomes clear that scaffold- able, cytosolic substrate. This system induces a local ‘‘cyto- substrate coupling does not represent the totality of signaling plasmic state’’ (Niethammer et al., 2007) by generating activity. Instead, it is part of a functional module that uses the a membrane-proximal gradient of phosphorylated substrate soluble building blocks of the canonical signaling pathway. The that extends into the cytoplasm. Such a phosphorylation localized functionality of this signaling subset is therefore distinct gradient emanating from the plasma membrane was, for from the canonical function of the soluble MAPK-signaling example, observed for the microtubule regulator stathmin/ pathway. OP18, locally switching off its microtubule destabilizing activity Which functions are then associated with the soluble part of and therefore enhancing microtubule density in the lamellipodia the pathway, and which are associated with the scaffold? One of migrating cells (Niethammer et al., 2004). possibility is that proliferative growth factor signals are trans- The duality of containing and propagating signals from the mitted to the nucleus via a short-lived, secondary gradient in plasma membrane also becomes apparent in the gradient of the ‘‘soluble’’ MAPK module. Negative feedback loops observed mitogen-activated protein kinase (MAPK) activity emanating in the MAPK module could only occur in the soluble cytoplasmic from the shmoo: a mating projection that occurs in response to state and might account for the pulse-like MAPK response typi- pheromone in budding yeast cells (Maeder et al., 2007). Here, cally observed after proliferative growth factor signals such as the MAPKK kinase (STE7), which phosphorylates MAPK (called EGF (Marshall, 1995). One such well-known negative feedback FUS3 in budding yeast), is localized to the plasma membrane loop is the inhibitory phosphorylation of SOS by ERK (Buday via interaction with the scaffold protein STE5. The sink in the et al., 1995). The resulting pulse of MAPK activity is sensed in cytoplasm is provided by the homogenously distributed phos- the nucleus by mechanisms such as incoherent feed-forward phatases PTP3 and MSG5. In order for this system to generate loops that operate as fold change detectors, thereby translating a gradient, phosphorylation of FUS3 must decrease its affinity relative activity changes into modified patterns of gene expres- for STE5 (van Drogen et al., 2001). This scaffold is itself recruited sion (Goentoro et al., 2009). to the membrane by interaction with the liberated bg subunits When MAPK signaling is restrained by a scaffold at the plasma (STE14/STE18) of a trimeric G protein after activation of the G membrane, this negative feedback might be absent or changed protein-coupled pheromone receptor STE2/3. Local plasma to positive feedback to allow for prolonged but membrane-prox- membrane composition also regulates STE5 localization in that imal signaling via ERK gradients. Such persistent cytoplasmic phosphatidylinositol 4,5-bisphosphate is required in the shmoo signaling might affect the local state of the cytoskeleton to membrane for its targeting (Garrenton et al., 2010). The ensuing induce and maintain specific morphologies. In agreement with local cytoplasmic state that contains active FUS3 proximal to the this notion, it was found that proliferative EGF stimulation leads plasma membrane has been suggested to maintain the structure to transient ERK activity in both the nucleus and the cytoplasm of the shmoo by local phosphorylation of the actin cable- of MCF7 cells, whereas differentiating Heregulin stimulation regulating formin, BNI1 (Matheos et al., 2004). However, the leads to sustained ERK activity in the cytoplasm and transient dimension of the FUS3 gradient is extensive enough to reach activity in the nucleus (Nakakuki et al., 2010). the nucleus of the small yeast cell, where active import causes an enrichment of phosphorylated, active FUS3 (Maeder et al., Intracellular Membranes: The Extended Axial Dimension 2007). This gradient/nuclear import system thereby generates Based on the previous section, one could argue that cytoplasmic two functional compartments with high levels of active FUS3. signals that affect the morphology of the cell via the cytoskeleton In the shmoo, active FUS3 maintains the structure of the cyto- are mostly maintained in the proximity of the plasma membrane skeleton, whereas in the nucleus, it affects gene expression. by gradient-generating mechanisms, whereas nuclear signals To achieve this kind of dual purpose signaling in much larger that propagate through the cytoplasm and affect gene expres- mammalian cells, the distance from the plasma membrane to sion are based on a temporal code. The endocytic system can the nucleus must be bridged while retaining membrane-proximal effectively extend the axial reach of plasma membrane signaling concentrations of phosphorylated substrates. Longer distances deep into the cytoplasm by transferring the source of gradient- can be covered by a cascade of coupled reaction cycles, which generating systems to vesicles within the cytoplasm (Birtwistle generate secondary shallower gradients from primary steep and Kholodenko, 2009). This allows signals to penetrate deeper ones (Stelling and Kholodenko, 2009). The extent of this into the cytoplasm without necessarily affecting gene expres- secondary gradient system resolves the problem of signal pene- sion. The lateral and axial propagation of RTK signals emanating tration but fails to provide an effective means of independently from the plasma membrane need also to be acutely terminated generating both global (transcriptional) and local (morphoge- by removing and deactivating the RTK source of the signal in

902 Cell 144, March 18, 2011 ª2011 Elsevier Inc. order to avoid uncontrolled signal spread and to resensitize the fashion, endosomes act as a constitutive sensing mechanism membrane for extracellular signals. Endocytosis of activated that could relay information between the plasma membrane receptors is one of the mechanisms for such a task (Wiley and and cytosol. Burke, 2001). Though a comprehensive description of the endo- Another key aspect of the endocytic system is that it generates cytic machinery is outside the scope of this Review, we would a mobile local cytoplasmic state by moving the signaling source like to discuss two functionalities of this system that are relevant through the inactivating cytoplasm. The concept of signaling en- to our discussion of the regulation of signaling by the living dosomes as entities in which selective and regulated RTK signal plasma membrane: the endosomal pathway that leads to lyso- transduction occurs was first described in the mid-1990s (Baass somal receptor degradation and endosomal-plasma membrane et al., 1995; Grimes et al., 1996). Functional microscopy experi- recycling. ments provided further evidence that cytosolic signaling proteins Depending on their type, activation state, and cellular context are recruited by activated receptors in endosomes, strength- (Le Roy and Wrana, 2005), activated growth factor receptors are ening the idea that endosomes are not just a path to degradation trapped in clathrin-coated pits, caveolae, or both. For example, and recycling (Sorkin et al., 2000; Wouters and Bastiaens, 1999). EGFR internalization occurs mostly through clathrin-mediated The endocytic machinery, positioned both temporally and phys- endocytosis activated by the ubiquitin ligase CBL within minutes ically between the plasma membrane and the lysosomal of ligand stimulation. A fraction of the activated RTKs (e.g., 30% compartment, thus provides a mechanism not only for signal in case of EGFR) is targeted for degradation using the RAB7- downregulation at the plasma membrane, but also for signal dependent degradative route from late endosomes through mul- propagation in the cytosol. Moreover, endosomes provide tivesicular bodies (MVB) to lysosomes. The snare system (van a reaction platform on which protein assemblies can generate den Bogaart et al., 2010) is important in this route for the fusion new functionality, as has been shown for APPL. These proteins of vesicular systems and the ubiquitin-regulated ESCRT are multifunctional adaptors and effectors of RAB5, which complexes to generate vesicles inside the MVB. Receptor recy- localize to a subpopulation of early endosomes but are also cling occurs through the slow (t1/2 z20 min through RAB8 and capable of nucleocytoplasmic shuttling. They have as many RAB11) and fast (t1/2 z5 min through RAB4) recycling pathways cytoplasmic targets as they have nuclear targets (Rashid et al., for clathrin-mediated endocytosis (Sorkin et al., 1991). The G 2009; Schenck et al., 2008). APPL-harboring endosomes are protein ARF6 is responsible for recycling of nonclathrin-medi- therefore an intermediate in signaling between the plasma ated endocytosed receptors. For more detail on the workings membrane and the nucleus (Rashid et al., 2009). of the endocytic machinery, we refer the reader to some recent Endosomes also provide a more long-range transport signal, excellent reviews (Miaczynska et al., 2004; Scita and Di Fiore, as shown for TRKA in neurons (Cosker et al., 2008; Ehlers 2010; Sorkin and von Zastrow, 2009) and the references therein. et al., 1995). Motor proteins such as dyneins can transport Recycling of activated receptors through the endocytic endosomes along microtubule tracts, allowing for rapid signal machinery constitutes a simple mechanism to reset the state propagation into the cytoplasm. Kinesins can effectively trans- of the plasma membrane in order to allow further stimuli after port vesicles back to the plasma membrane. For example, a refractory period. The endocytic system effectively regulates NGF-activated TRKA-containing endosomes contribute to retro- the response properties of the plasma membrane as a signaling grade transport of survival signals from the tip of the axon to the entity by controlling the availability of receptors at the cell soma of the neuron. This type of signal transport is of special surface. As we have already discussed, the concentration of relevance to cells that have extended cytoplasmic structures receptors in the plasma membrane can change the qualitative like neurons, where growth cone signals have to travel over milli- behavior of the signaling response, as shown for COS7 cells in meter lengths in order to reach the cell soma. This would take which lateral propagation of EGF local activation was only days if mediated by passive diffusion alone (Howe, 2005). possible upon EGFR overexpression. However, in this work, it The geometrical properties of endosomes, such as their was also shown that the same result could be achieved by block- reduced size and closed surface, also facilitate signal amplifica- ing endocytosis, which augments the poststimulus concentration. The elevated ligand concentration in the endosomal lumen tion of receptors in the plasma membrane (Sawano et al., makes full activation of contained receptors possible. In a similar 2002). In the case of the PDGFb receptor, which is usually not way to membrane domains in the plasma membrane that are recycled but degraded, recycling can be induced by the loss of contained by the cytoskeleton, their reduced size increases the T cell PTP activity. This results in an enrichment of the phosphor- chance of amplifying growth factor signals in encounter-driven ylated receptor and an increase in PLCg activity at the plasma activation processes. In such processes, a higher chance of membrane that stimulates cell migration (Karlsson et al., 2006). second encounters compensates for the fact that not all encoun- Both examples show that receptor recycling can change the ters lead to activation. This effect is especially significant in response of the cell by altering the qualitative behavior of the receptors that must undergo multiple activation steps to attain ensemble of receptors at the plasma membrane in response to full kinase activity (Lemmon and Schlessinger, 2010) and leads a stimulus. However, continuous endosomal cycling may also to faster activation of these receptor populations (Figure 3). play a role in maintaining and propagating a signal. For example, Examples of such RTKs are the insulin and IGF1 receptors. transforming growth factor (TGF)-b stimulation activates the Endosomes can also be an effective way to deliver hydrogen SMAD system, which is shuttled from cytosol to nucleus and peroxide gradients deep inside of the cell to enhance local back (Inman et al., 2002). It was shown that, by cycling, the signal signaling in the cytoplasm around them. Here, a gradient that transducers continuously monitor receptor activity. In a similar is generated proximal to the plasma membrane is transported

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 903 with the endosome (Oakley et al., 2009). For example, RAC- mediated NOX association with IL-1 and subsequent internalization in endosomes result in superoxide production and conversion to hydrogen peroxide in the lumen of the endosome (Li et al., 2006). The diffusion of hydrogen peroxide out of the lumen of the endosome to the surrounding cytosol results in efficient coupling to its targets in the cytoplasm. At the plasma membrane, this coupling is less efficient, as half of the hydrogen peroxide is lost to the extracellular milieu. The aforementioned examples show how endocytosis modulates signaling, but the inverse is also true. For example, the routing and timing of receptors in the endosomal system are regulated by cargo-mediated G protein signaling, mostly through RAB family G proteins. An example is the early-to-late endosomal conversion characterized by the RAB5-RAB7 switch (Spang, 2009). The timing of this switch involves coordination of two opposing behaviors: a fast positive and a slow negative feedback loop. In the early endosome, the recruitment of RAB5 by its GEF (RABX5) initiates a positive feedback loop whereby RAB5-GTP activates RABX5. Subsequent binding and activation of RAB5 effector molecules such as the phosphatidylinositol 3-kinase (PI3K) VPS34 result in the local synthesis of phosphatidylinositol 3-phosphate (PI3P). On the one hand, the accumulation of PI3P recruits SAND-1, which displaces RABX5 and thereby disrupts the RAB5-RABX5 positive feedback loop (Del Conte-Zerial et al., 2008; Poteryaev et al., 2010). On the other hand, it has been shown that the SAND-1 yeast homolog MON1p interacts with the HOPS complex. A subunit of this complex, VPS39p acts as a GEF for yeast RAB7 and therefore activates it (Bohdanowicz and Grinstein, 2010). In essence, this system behaves like a transistor in gain amplification mode wherein the cargo-mediated activation of PI3K (the signal) generates the base current of PI3P that, upon accumulation, triggers the RAB5-RAB7 switch (Vartak and Bastiaens, 2010). The cargo therefore influences its own endocytic routing in a time- dependent manner. Just as the primordial plasma membrane provided a surface within which reactions were facilitated through containment of diffusion (Griffiths, 2007), the endocytic system does the same for the cytosol of the eukaryotic cell. It also constitutes a shuttling service, providing bidirectional communication between the plasma membrane and the interior of the cell. Evolution has thus tinkered with the endocytic machinery to generate information-processing functions (Jacob, 1977). This tight coupling between the membrane and signaling systems is becoming increasingly important not only to our understanding of the function of both systems, but also to our ability to steer them away Figure 3. Constraining Receptor Diffusion Accelerates Activation (A) Example of Brownian motion simulations for a receptor with a single (top) or from pathological behaviors. double (bottom) step activation mechanism. A plot shows the time evolution (x axis) for 20 receptors (y axis) for a particular realization of the simulation in Signal Propagation between Membrane Compartments which a single receptor is activated at time 0. The receptor state is color coded. Three key frames depicting the receptor’s trajectory for highlighted particles In the previous section, we discussed how the endocytic are shown. machinery transports the reactive properties of the plasma (B) Average (solid line) evolution of the fraction of receptor in each state for 100 membrane into the cytoplasm to propagate signals. We now realizations of the simulation. The standard error of the mean is indicated discuss the spatial organization of the small G protein RAS as (dashed lines). t1 and t2 indicate the times at which the activated population reaches 50%. an example of a mechanism that transports the reactive proper- (C) The simulation was repeated for different domain sizes and numbers of ties of the plasma membrane to another membrane compart- particles. The difference between t2 and t1 is reduced as the size of the domain ment within the cytoplasm. Here, the association between the is reduced. signal carrier and membranes is achieved through the addition

904 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 4. Localization of RAS through the Acylation Cycle (A) To create a full three-dimensional (3D) simulation of the reaction-diffusion system that underlies the acylation cycle, a 3D stack of confocal images (left) is registered to identify three compartments (right): plasma membrane (white), cytosol/endoplasmic reticulum (gray), and Golgi (red). Computer simulations were performed with a cellular automaton approach to reflect: palmitoylated versus unpalmitoylated RAS; localized PAT-activity at the Golgi; ubiquitous thioesterase activity; unidirectional transport of palmitoylated species from Golgi to the plasma membrane; high inter- and intracompartmental mobility of unpalmitoylated versus low mobility of palmitoylated species. (B) Localization of the two species of palmitoylated (green) versus unpalmitoylated RAS (red) in presence of ubiquitous thioesterase activity (top row) and blocking thioesterase activity (bottom row) shown as an overlay of both species colored according to the 2D color map (right), wherein white denotes oversaturation of palmitoylated RAS (otherwise green). Starting from the initial condition of 100% unpalmitoylated RAS (left), the distribution of palmitoylation evolves toward enrichment of palmitoylated RAS at the plasma membrane and Golgi (upper-right) versus unspe- cific distribution over all membranes in case of thioesterase inhibition (lower-right). of lipid anchors to the protein. Such lipid modifications are slowly redistribute over all membranes to reach a homogeneous required for the membrane targeting of many proteins and for equilibrium distribution. Before this can occur, the nonequilib- the enrichment of target proteins in specific microdomains on rium state of RAS enrichment at the Golgi is transferred to the organelles (Hancock, 2003; Resh, 1999). plasma membrane via the secretory pathway (Choy et al., Small G proteins exist in either a GTP-bound (activated) or 1999; Rocks et al., 2010). Away from the high PAT activity at GDP-bound (inactivated) state within a catalytic GTPase cycle the Golgi, the palmitoyl group is removed by ubiquitous thioes- operated through the intervention of guanine nucleotide terase activity. This depalmitoylated, farnesylated N/H-RAS exchange factors (GEFs) and GTPase-activating proteins rapidly redistributes over all endomembranes, enhancing the (GAPs). The GTP-binding proteins of the RAS family, which are chance of re-encounter and trapping at the Golgi. Repeated involved in a wide range of signal transduction processes, cycles of de/repalmitoylation together with Golgi trapping by pal- undergo various lipid modifications at the C terminus. The mitoylation and the directionality of the secretory pathway thus H-RAS and N-RAS isoforms undergo two types of lipid modifica- constitute a spatially organizing system that counters the tion: an irreversible farnesylation at the cysteine residue of the entropy-driven re-equilibration of lipidated RAS throughout the CAAX box, followed by reversible palmitoylation at specific membranes of the cell (Rocks et al., 2010; Rocks et al., 2005). cysteine residues in the C-terminal hypervariable region (HVR). Simulation of this reaction-diffusion system using realistic This S-acylation is unique in that it is the only known reversible reaction and diffusion parameters shows that our intuition that lipid modification (Linder and Deschenes, 2003; Smotrys and the proposed dynamic mechanism can generate the observed Linder, 2004). Farnesylation conveys a membrane affinity to asymmetric spatial organization of RAS is correct (Figure 4). RAS but still allows high intercompartmental mobility. Additional We can also predict that any interference with the dynamics of palmitoylation further increases its affinity to membranes without the acylation cycle will cause RAS to lose its specific localization, conferring specificity for any membrane compartment. irrespective of its lipidation state (Figure 4B, lower row). This How then can spatial organization arise from these posttrans- insight leads to a counterintuitive target for affecting RAS local- lational lipidations? Consider unpalmitoylated but farnesylated ization and thereby its signaling capacity: thioesterase activity. RAS (Figure 4B, first column), which is distributed homogenously Palmostatin-B, an inhibitor of the cellular thioesterase APT1, among membrane compartments in the cell. Localizing palmitoyl has indeed been shown to cause fully palmitoylated RAS to transferase (PAT) activity to the Golgi apparatus (Rocks et al., redistribute more equally between membrane compartments 2005) enables this membrane compartment to trap newly palmi- and thereby lower oncogenic H-RAS-G12V signaling activity toylated RAS. This trapping occurs because palmitoylation (Dekker et al., 2010). enhances the stability of the interaction of RAS with the Let us now consider how the spatial organization of RAS membrane, thereby slowing its diffusion (Silvius and l’Heureux, affects information processing and transfer across membrane 1994). However, if this were the end of the story, all RAS mole- compartments. G proteins are active in the GTP-bound form cules would eventually be trapped at the Golgi and would then because it is in this state that they can interact with effectors.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 905 The GTP binding state of RAS is almost exclusively determined networks, in effect giving the cell back its decision-making by the relative local abundance of GEFs and GAPs and has no capacity. For example, the epithelial cell line MDCK-f3, which influence on the acylation cycle that regulates its spatial organi- expresses oncogenic H-RAS-G12V, has undergone epithelial- zation. By differentially localizing GAPs and GEFs to specific to-mesenchymal transition (Thiery et al., 2009), thereby losing membrane compartments, a local RAS activity state can there- cell contact inhibition (Chen et al., 2011). This phenotype can fore be generated. For example, upon binding of ligand to be reset to an overall cell shape and contact inhibition level of RTKs at the plasma membrane, an increase in GEF activity that of untransformed cells by the thioesterase inhibitor palmos- increases the local concentration of active GTP-RAS. The Son tatin-B (Dekker et al., 2010), which results in a decreased concen- of Sevenless (SOS) canonical RAS-GEF not only interacts with tration of oncogenic H-RAS-G12V at the plasma membrane and activated phosphorylated receptors through binding adaptor Golgi. proteins, such as GRB2 (Innocenti et al., 2002; Jang et al., Though this will reduce interactions between RAS and its 2010), but also contains an allosteric site that increases its effectors such as RAF at the plasma membrane, effector activa- activity upon RAS-GTP-binding (Freedman et al., 2006). In solution is not the only point which needs to be considered. An onco- tion, only low-affinity-binding of SOS to Ras-GTP is observed. genic RAS mutation encoded at a single allele has the potential Within the cell, however, the effective concentration of SOS is to activate wild-type RAS (encoded by the other allele) via the enhanced by its sequestration within the two-dimensional SOS feedback system described above in a dose-dependent plasma membrane, where RAS is also enriched. This reduction manner. If the dose of oncogenic RAS at the plasma membrane in dimensionality increases the effective concentration of both is sufficiently low, the system can still respond to growth factor SOS and RAS and hence facilitates activation of the system binding to cognate receptors through wild-type RAS. The onco- through positive feedback. genic RAS generates an offset in the downstream signaling RAS activation is countered by increasing RAS-GAP concen- amplitude (for example in activated ERK) that is filtered out by trations at the plasma membrane. A slower or delayed increase the fold change detection mechanism in the gene expression in RAS-GAP concentration thus generates a pulse of RAS machinery (Goentoro et al., 2009). However, if the dose of onco- activity at the plasma membrane (Augsten et al., 2006). Subse- genic RAS at the plasma membrane is high enough to overcome quently, this membrane-proximal activation pulse is subjected a threshold to activate wild-type RAS by the SOS feedback loop, to the acylation cycle, transforming the spatial organization of all RAS in the cell is activated and can no longer switch in RAS into a temporal response spanning the cell. The acylation response to extracellular stimulation. Lowering the amount of cycle can therefore be considered as a carrier wave that links palmitoylatable RAS proximal to the plasma membrane by the intracellular membrane compartments and is modulated thioesterase inhibition using palmostatin-B thus reduces the by the state of the plasma membrane. It was recently shown effective dose of oncogenic RAS at the plasma membrane below that the Golgi lacks RAS-specific GEF/GAP activity in certain the threshold that is needed to trigger SOS feedback. This cells and appears to act as a passive receiver of the RAS signal effectively inactivates the wild-type RAS population, such that from the plasma membrane (Lorentzen et al., 2010). In this case, it can respond again to growth factor activation by nucleotide a diffusion-broadened echo of the original pulse of RAS activity is exchange and therefore reacquire its decision-making capacity. observed at the Golgi. The plasma membrane, because of its high degree of GEF/GAP regulation, is, in contrast, effectively Perspectives decoupled from the activity of RAS at the Golgi. The living plasma membrane and its extension inside of the cell The endoplasmic reticulum (ER), however, acting as a platform as endocytic vesicles constitute a system to receive, integrate, for fast trafficking of depalmitoylated RAS between the PM and and distribute external and internal signals. True understanding the Golgi, offers a stage for further regulation of RAS activity of the living plasma membrane and its intricate connection to before it becomes trapped at the Golgi. Indeed, growth factor- signaling requires novel experimental and theoretical methods induced upregulation of GEF activity at the ER, either by removal that can take full account of the spatiotemporal asymmetries of GAPs or increase of GEFs, causes sustained RAS activity at and confinement of its components, both of which provide the the Golgi (Lorentzen et al., 2010). This system is capable of cell with its unique ability to regulate signal propagation. generating a biphasic response at the Golgi: a broadened echo Bottom-up approaches such as the generation of reconsti- from the plasma membrane pulse convoluted with sustained tuted living membranes that mimic the dynamics of biological activity from the ER. Given that there are now many known systems will be indispensable to move away from equilibrium examples of signaling networks in which the gene expression biophysics. Here, the molecular machinery for membrane fusion machinery is sensitive to the temporal properties of upstream and fission needs to be introduced in reconstituted membrane signals, the capacity to transfer reaction properties across systems (Wollert and Hurley, 2010) to mimic the effects of vesicle membrane surfaces is likely to have a fundamental impact on transport dynamics on the information processing ability of the regulation of cellular state (Goentoro et al., 2009; Kholodenko a membrane. Moreover, the introduction of an endomembrane and Kolch, 2008; Murphy et al., 2004). system in prokaryotes together with further research into bacte- This is especially important when considering cell fate in the rial organisms with an endocytic machinery (Lonhienne et al., presence of oncogenic, constitutively active forms of RAS that 2010) will provide insight into the development of cell compart- ‘‘short-circuit’’ the above signaling network into constant mentalization and its effect on signal processing. activity. Modulating the localization of such signaling molecules Top-down approaches such as spatiotemporal monitoring of can be a means of restoring the switching functionality of those the endocytic system in eukaryotes will be vital to understand

906 Cell 144, March 18, 2011 ª2011 Elsevier Inc. signal integration. The importance of different modes of spatial Dekker, F.J., Rocks, O., Vartak, N., Menninger, S., Hedberg, C., Balamurugan, propagation in signaling is now clear. However, more and better R., Wetzel, S., Renner, S., Gerauer, M., Scho¨ lermann, B., et al. (2010). Small- genetically encoded fluorescent protein biosensors that report molecule inhibition of APT1 affects Ras localization and signaling. Nat. Chem. Biol. 6, 449–456. on the local activity of proteins need to be developed in order Del Conte-Zerial, P., Brusch, L., Rink, J.C., Collinet, C., Kalaidzidis, Y., Zerial, for us to better verify our hypotheses. This should go hand in M., and Deutsch, A. (2008). Membrane identity and GTPase cascades hand with the further development of new functional imaging regulated by toggle and cut-out switches. Mol. Syst. Biol. 4, 206. approaches such as fluorescence correlation spectroscopy, Deribe, Y.L., Pawson, T., and Dikic, I. (2010). Post-translational modifications with sufficient spatial resolution to take advantage of the full in signal integration. Nat. Struct. Mol. Biol. 17, 666–672. potential of these biosensors (Maeder et al., 2007). Experimental Devaux, P.F., and Morris, R. (2004). Transmembrane asymmetry and lateral approaches also need to be complemented by computational domains in biological membranes. Traffic 5, 241–246. models and simulations not only to accurately interpret the Ehlers, M.D., Kaplan, D.R., Price, D.L., and Koliatsos, V.E. (1995). NGF- experimental results, but also to place them in a correct spatio- stimulated retrograde transport of trkA in the mammalian nervous system. temporal frame. We have taken as an example the properties J. Cell Biol. 130, 149–156. arising from localization-dependent interaction in a realistic Finkel, T. (2006). Intracellular redox regulation by the family of small GTPases. experimentally acquired cellular geometry. Such models can Antioxid. Redox Signal. 8, 1857–1863. be expected to show emergent and often counterintuitive behav- Fischer, E.H., Charbonneau, H., and Tonks, N.K. (1991). Protein tyrosine phos- iors that are not readily understood. phatases: a diverse family of intracellular and transmembrane enzymes. The convergence of bottom-up and top-down experimental Science 253, 401–406. approaches, together with computational models and simula- Frank, C., Burkhardt, C., Imhof, D., Ringel, J., Zscho¨ rnig, O., Wieligmann, K., tions that feed back into the experimental design, will eventually Zacharias, M., and Bo¨ hmer, F.D. (2004). Effective dephosphorylation of Src substrates by SHP-1. J. Biol. Chem. 279, 11375–11383. allow us to move away from a static picture of the plasma membrane to a movie, featuring a dynamic and living entity Freedman, T.S., Sondermann, H., Friedland, G.D., Kortemme, T., Bar-Sagi, D., Marqusee, S., and Kuriyan, J. (2006). A Ras-induced conformational switch in that generates shapes and makes decisions. the Ras activator Son of sevenless. Proc. Natl. Acad. Sci. USA 103, 16692– 16697. REFERENCES Garrenton, L.S., Stefan, C.J., McMurray, M.A., Emr, S.D., and Thorner, J. (2010). Pheromone-induced anisotropy in yeast plasma membrane phospha- Augsten, M., Pusch, R., Biskup, C., Rennert, K., Wittig, U., Beyer, K., Blume, tidylinositol-4,5-bisphosphate distribution is required for MAPK signaling. A., Wetzker, R., Friedrich, K., and Rubio, I. (2006). Live-cell imaging of endog- Proc. Natl. Acad. Sci. USA 107, 11805–11810. enous Ras-GTP illustrates predominant Ras activation at the plasma Goentoro, L., Shoval, O., Kirschner, M.W., and Alon, U. (2009). The incoherent membrane. EMBO Rep. 7, 46–51. feedforward loop can provide fold-change detection in gene regulation. Mol. Baass, P.C., Di Guglielmo, G.M., Authier, F., Posner, B.I., and Bergeron, J.J. Cell 36, 894–899. (1995). Compartmentalized signal transduction by receptor tyrosine kinases. Griffiths, G. (2007). Cell evolution and the problem of membrane topology. Nat. Trends Cell Biol. 5, 465–470. Rev. Mol. Cell Biol. 8, 1018–1024. Birtwistle, M.R., and Kholodenko, B.N. (2009). Endocytosis and signalling: Grimes, M.L., Zhou, J., Beattie, E.C., Yuen, E.C., Hall, D.E., Valletta, J.S., a meeting with mathematics. Mol. Oncol. 3, 308–320. Topp, K.S., LaVail, J.H., Bunnett, N.W., and Mobley, W.C. (1996). Endocytosis Bohdanowicz, M., and Grinstein, S. (2010). Vesicular traffic: a Rab SANDwich. of activated TrkA: evidence that nerve growth factor induces formation of Curr. Biol. 20, R311–R314. signaling endosomes. J. Neurosci. 16, 7950–7964. Brown, D.I., and Griendling, K.K. (2009). Nox proteins in signal transduction. Hancock, J.F. (2003). Ras proteins: different signals from different locations. 4 Free Radic. Biol. Med. 47, 1239–1253. Nat. Rev. Mol. Cell Biol. , 373–384. Buday, L., Warne, P.H., and Downward, J. (1995). Downregulation of the Ras Harding, A.S., and Hancock, J.F. (2008). Using plasma membrane nanoclus- 18 activation pathway by MAP kinase phosphorylation of Sos. Oncogene 11, ters to build better signaling circuits. Trends Cell Biol. , 364–371. 1327–1331. Howe, C.L. (2005). Modeling the signaling endosome hypothesis: why a drive 2 Chen, Y.S., Mathias, R.A., Mathivanan, S., Kapp, E.A., Moritz, R.L., Zhu, H.J., to the nucleus is better than a (random) walk. Theor. Biol. Med. Model. , 43. and Simpson, R.J. (2011). Proteomics profiling of Madin-Darby canine kidney Inman, G.J., Nicola´ s, F.J., and Hill, C.S. (2002). Nucleocytoplasmic shuttling of plasma membranes reveals Wnt-5a involvement during oncogenic H-Ras/ Smads 2, 3, and 4 permits sensing of TGF-beta receptor activity. Mol. Cell 10, TGF-{beta}-mediated epithelial-mesenchymal transition. Mol. Cell. Proteo- 283–294. mics 10, M110.001131. Innocenti, M., Tenca, P., Frittoli, E., Faretta, M., Tocchetti, A., Di Fiore, P.P., Choy, E., Chiu, V.K., Silletti, J., Feoktistov, M., Morimoto, T., Michaelson, D., and Scita, G. (2002). Mechanisms through which Sos-1 coordinates the acti- Ivanov, I.E., and Philips, M.R. (1999). Endomembrane trafficking of ras: the vation of Ras and Rac. J. Cell Biol. 156, 125–136. CAAX motif targets proteins to the ER and Golgi. Cell 98, 69–80. Jacob, F. (1977). Evolution and tinkering. Science 196, 1161–1166. Clayton, A.H., Walker, F., Orchard, S.G., Henderson, C., Fuchs, D., Rothacker, Jang, I.K., Zhang, J., Chiang, Y.J., Kole, H.K., Cronshaw, D.G., Zou, Y., and J., Nice, E.C., and Burgess, A.W. (2005). Ligand-induced dimer-tetramer tran- Gu, H. (2010). Grb2 functions at the top of the T-cell antigen receptor-induced sition during the activation of the cell surface epidermal growth factor tyrosine kinase cascade to control thymic selection. Proc. Natl. Acad. Sci. USA receptor-A multidimensional microscopy analysis. J. Biol. Chem. 280, 107, 10620–10625. 30392–30399. Janmey, P.A., and Kinnunen, P.K. (2006). Biophysical properties of lipids and Cosker, K.E., Courchesne, S.L., and Segal, R.A. (2008). Action in the axon: dynamic membranes. Trends Cell Biol. 16, 538–546. 18 generation and transport of signaling endosomes. Curr. Opin. Neurobiol. , Janssen-Heininger, Y.M., Mossman, B.T., Heintz, N.H., Forman, H.J., Kalya- 270–275. naraman, B., Finkel, T., Stamler, J.S., Rhee, S.G., and van der Vliet, A. Dehmelt, L., and Bastiaens, P.I. (2010). Spatial organization of intracellular (2008). Redox-based regulation of signal transduction: principles, pitfalls, communication: insights from imaging. Nat. Rev. Mol. Cell Biol. 11, 440–452. and promises. Free Radic. Biol. Med. 45, 1–17.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 907 Karlsson, S., Kowanetz, K., Sandin, A., Persson, C., Ostman, A., Heldin, C.H., Murphy, L.O., MacKeigan, J.P., and Blenis, J. (2004). A network of immediate and Hellberg, C. (2006). Loss of T-cell protein tyrosine phosphatase induces early gene products propagates subtle differences in mitogen-activated recycling of the platelet-derived growth factor (PDGF) beta-receptor but not protein kinase signal amplitude and duration. Mol. Cell. Biol. 24, 144–153. 17 the PDGF alpha-receptor. Mol. Biol. Cell , 4846–4855. Muth, T.R., and Caplan, M.J. (2003). Transport protein trafficking in polarized Kholodenko, B.N., and Kolch, W. (2008). Giving space to cell signaling. Cell cells. Annu. Rev. Cell Dev. Biol. 19, 333–366. 133, 566–567. Nakakuki, T., Birtwistle, M.R., Saeki, Y., Yumoto, N., Ide, K., Nagashima, T., Kusumi, A., Nakada, C., Ritchie, K., Murase, K., Suzuki, K., Murakoshi, H., Brusch, L., Ogunnaike, B.A., Okada-Hatakeyama, M., and Kholodenko, B.N. Kasai, R.S., Kondo, J., and Fujiwara, T. (2005). Paradigm shift of the plasma (2010). Ligand-specific c-Fos expression emerges from the spatiotemporal 141 membrane concept from the two-dimensional continuum fluid to the parti- control of ErbB network dynamics. Cell , 884–896. tioned fluid: high-speed single-molecule tracking of membrane molecules. Nelson, S., Horvat, R.D., Malvey, J., Roess, D.A., Barisas, B.G., and Clay, C.M. Annu. Rev. Biophys. Biomol. Struct. 34, 351–378. (1999). Characterization of an intrinsically fluorescent gonadotropin-releasing Kusumi, A., and Sako, Y. (1996). Cell surface organization by the membrane hormone receptor and effects of ligand binding on receptor lateral diffusion. 140 skeleton. Curr. Opin. Cell Biol. 8, 566–574. Endocrinology , 950–957. Niethammer, P., Bastiaens, P., and Karsenti, E. (2004). Stathmin-tubulin inter- Kwik, J., Boyle, S., Fooksman, D., Margolis, L., Sheetz, M.P., and Edidin, M. action gradients in motile and mitotic cells. Science 303, 1862–1866. (2003). Membrane cholesterol, lateral mobility, and the phosphatidylinositol 4,5-bisphosphate-dependent organization of cell actin. Proc. Natl. Acad. Niethammer, P., Kronja, I., Kandels-Lewis, S., Rybina, S., Bastiaens, P., and Sci. USA 100, 13964–13969. Karsenti, E. (2007). Discrete states of a protein interaction network govern interphase and mitotic microtubule dynamics. PLoS Biol. 5, e29. Le Roy, C., and Wrana, J.L. (2005). Clathrin- and non-clathrin-mediated endocytic regulation of cell signalling. Nat. Rev. Mol. Cell Biol. 6, 112–126. Oakley, F.D., Abbott, D., Li, Q., and Engelhardt, J.F. (2009). Signaling components of redox active endosomes: the redoxosomes. Antioxid. Redox Signal. Lee, E., Salic, A., Kru¨ ger, R., Heinrich, R., and Kirschner, M.W. (2003). The 11, 1313–1333. roles of APC and Axin derived from experimental and theoretical analysis of Poteryaev, D., Datta, S., Ackema, K., Zerial, M., and Spang, A. (2010). Identi- the Wnt pathway. PLoS Biol. 1, E10. fication of the switch in early-to-late endosome transition. Cell 141, 497–508. Lee, S.R., Kwon, K.S., Kim, S.R., and Rhee, S.G. (1998). Reversible inactiva- Ramadurai, S., Duurkens, R., Krasnikov, V.V., and Poolman, B. (2010). Lateral tion of protein-tyrosine phosphatase 1B in A431 cells stimulated with diffusion of membrane proteins: consequences of hydrophobic mismatch and epidermal growth factor. J. Biol. Chem. 273, 15366–15372. lipid composition. Biophys. J. 99, 1482–1489. Lemmon, M.A., and Schlessinger, J. (2010). Cell signaling by receptor tyrosine Rashid, S., Pilecka, I., Torun, A., Olchowik, M., Bielinska, B., and Miaczynska, 141 kinases. Cell , 1117–1134. M. (2009). Endosomal adaptor proteins APPL1 and APPL2 are novel activators Li, Q., Harraz, M.M., Zhou, W., Zhang, L.N., Ding, W., Zhang, Y., Eggleston, T., of beta-catenin/TCF-mediated transcription. J. Biol. Chem. 284, 18115– Yeaman, C., Banfi, B., and Engelhardt, J.F. (2006). Nox2 and Rac1 regulate 18128. H2O2-dependent recruitment of TRAF6 to endosomal interleukin-1 receptor Resh, M.D. (1999). Fatty acylation of proteins: new insights into membrane 26 complexes. Mol. Cell. Biol. , 140–154. targeting of myristoylated and palmitoylated proteins. Biochim. Biophys. Lim, W.A., and Pawson, T. (2010). Phosphotyrosine signaling: evolving a new Acta 1451, 1–16. cellular communication system. Cell 142, 661–667. Reynolds, A.R., Tischer, C., Verveer, P.J., Rocks, O., and Bastiaens, P.I. Linder, M.E., and Deschenes, R.J. (2003). New insights into the mechanisms of (2003). EGFR activation coupled to inhibition of tyrosine phosphatases causes 5 protein palmitoylation. Biochemistry 42, 4311–4320. lateral signal propagation. Nat. Cell Biol. , 447–453. Lingwood, D., and Simons, K. (2010). Lipid rafts as a membrane-organizing Rhee, S.G. (2006). Cell signaling. H2O2, a necessary evil for cell signaling. 312 principle. Science 327, 46–50. Science , 1882–1883. Rocks, O., Gerauer, M., Vartak, N., Koch, S., Huang, Z.P., Pechlivanis, M., Lonhienne, T.G., Sagulenko, E., Webb, R.I., Lee, K.C., Franke, J., Devos, D.P., Kuhlmann, J., Brunsveld, L., Chandra, A., Ellinger, B., et al. (2010). The palmi- Nouwens, A., Carroll, B.J., and Fuerst, J.A. (2010). Endocytosis-like protein toylation machinery is a spatially organizing system for peripheral membrane uptake in the bacterium Gemmata obscuriglobus. Proc. Natl. Acad. Sci. proteins. Cell 141, 458–471. USA 107, 12883–12888. Rocks, O., Peyker, A., Kahms, M., Verveer, P.J., Koerner, C., Lumbierres, M., Lorentzen, A., Kinkhabwala, A., Rocks, O., Vartak, N., and Bastiaens, P.I.H. Kuhlmann, J., Waldmann, H., Wittinghofer, A., and Bastiaens, P.I. (2005). An (2010). Regulation of Ras localization by acylation enables a mode of intracel- acylation cycle regulates localization and activity of palmitoylated Ras iso- lular signal propagation. Sci. Signal. 3, ra68. forms. Science 307, 1746–1752. Maeder, C.I., Hink, M.A., Kinkhabwala, A., Mayr, R., Bastiaens, P.I., and Knop, Rodriguez-Boulan, E., Kreitzer, G., and Mu¨ sch, A. (2005). Organization of M. (2007). Spatial regulation of Fus3 MAP kinase activity through a reaction- vesicular trafficking in epithelia. Nat. Rev. Mol. Cell Biol. 6, 233–247. diffusion mechanism in yeast pheromone signalling. Nat. Cell Biol. 9, 1319– Saffman, P.G., and Delbru¨ ck, M. (1975). Brownian motion in biological 1326. membranes. Proc. Natl. Acad. Sci. USA 72, 3111–3113. Markus, M., Bo¨ hm, D., and Schmick, M. (1999). Simulation of vessel morpho- Sawano, A., Takayama, S., Matsuda, M., and Miyawaki, A. (2002). Lateral genesis using cellular automata. Math. Biosci. 156, 191–206. propagation of EGF signaling after local stimulation is dependent on receptor Marshall, C.J. (1995). Specificity of receptor tyrosine kinase signaling: tran- density. Dev. Cell 3, 245–257. sient versus sustained extracellular signal-regulated kinase activation. Cell Schenck, A., Goto-Silva, L., Collinet, C., Rhinn, M., Giner, A., Habermann, B., 80 , 179–185. Brand, M., and Zerial, M. (2008). The endosomal protein Appl1 mediates Akt Matheos, D., Metodiev, M., Muller, E., Stone, D., and Rose, M.D. (2004). Pher- substrate specificity and cell survival in vertebrate development. Cell 133, omone-induced polarization is dependent on the Fus3p MAPK acting through 486–497. 165 the formin Bni1p. J. Cell Biol. , 99–109. Scita, G., and Di Fiore, P.P. (2010). The endocytic matrix. Nature 463, 464–473. Mellman, I., and Nelson, W.J. (2008). Coordinated protein sorting, targeting Silvius, J.R., and l’Heureux, F. (1994). Fluorimetric evaluation of the affinities of and distribution in polarized cells. Nat. Rev. Mol. Cell Biol. 9, 833–845. isoprenylated peptides for lipid bilayers. Biochemistry 33, 3014–3022. Miaczynska, M., Pelkmans, L., and Zerial, M. (2004). Not just a sink: endo- Simons, K., and Ikonen, E. (1997). Functional rafts in cell membranes. Nature somes in control of signal transduction. Curr. Opin. Cell Biol. 16, 400–406. 387, 569–572.

908 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Simons, K., and van Meer, G. (1988). Lipid sorting in epithelial cells. Biochem- van den Bogaart, G., Holt, M.G., Bunt, G., Riedel, D., Wouters, F.S., and Jahn, istry 27, 6197–6202. R. (2010). One SNARE complex is sufficient for membrane fusion. Nat. Struct. Singer, S.J., and Nicolson, G.L. (1972). The fluid mosaic model of the structure Mol. Biol. 17, 358–364. 175 of cell membranes. Science , 720–731. van der Wouden, J.M., Maier, O., van IJzendoorn, S.C., and Hoekstra, D. Smotrys, J.E., and Linder, M.E. (2004). Palmitoylation of intracellular signaling (2003). Membrane dynamics and the regulation of epithelial cell polarity. Int. proteins: regulation and function. Annu. Rev. Biochem. 73, 559–587. Rev. Cytol. 226, 127–164. Sorkin, A., Krolenko, S., Kudrjavtceva, N., Lazebnik, J., Teslenko, L., Soder- van Drogen, F., Stucke, V.M., Jorritsma, G., and Peter, M. (2001). MAP kinase quist, A.M., and Nikolsky, N. (1991). Recycling of epidermal growth factor- dynamics in response to pheromones in budding yeast. Nat. Cell Biol. 3, 1051– receptor complexes in A431 cells: identification of dual pathways. J. Cell 1059. Biol. 112, 55–63. van Meer, G., Voelker, D.R., and Feigenson, G.W. (2008). Membrane lipids: Sorkin, A., McClure, M., Huang, F., and Carter, R. (2000). Interaction of EGF where they are and how they behave. Nat. Rev. Mol. Cell Biol. 9, 112–124. receptor and grb2 in living cells visualized by fluorescence resonance energy transfer (FRET) microscopy. Curr. Biol. 10, 1395–1398. Vartak, N., and Bastiaens, P. (2010). Spatial cycles in G-protein crowd control. 29 Sorkin, A., and von Zastrow, M. (2009). Endocytosis and signalling: intertwining EMBO J. , 2689–2699. molecular networks. Nat. Rev. Mol. Cell Biol. 10, 609–622. Wiley, H.S., and Burke, P.M. (2001). Regulation of receptor tyrosine kinase Spang, A. (2009). On the fate of early endosomes. Biol. Chem. 390, 753–759. signaling by endocytic trafficking. Traffic 2, 12–18. Stelling, J., and Kholodenko, B.N. (2009). Signaling cascades as cellular Wollert, T., and Hurley, J.H. (2010). Molecular mechanism of multivesicular 58 devices for spatial computations. J. Math. Biol. , 35–55. body biogenesis by ESCRT complexes. Nature 464, 864–869. Suzuki, A., and Ohno, S. (2006). The PAR-aPKC system: lessons in polarity. J. Woo, H.A., Yim, S.H., Shin, D.H., Kang, D., Yu, D.Y., and Rhee, S.G. (2010). Cell Sci. 119, 979–987. Inactivation of peroxiredoxin I by phosphorylation allows localized H(2)O(2) Thiery, J.P., Acloque, H., Huang, R.Y., and Nieto, M.A. (2009). Epithelial- accumulation for cell signaling. Cell 140, 517–528. mesenchymal transitions in development and disease. Cell 139, 871–890. Wouters, F.S., and Bastiaens, P.I. (1999). Fluorescence lifetime imaging of Tischer, C., and Bastiaens, P.I. (2003). Lateral phosphorylation propagation: receptor tyrosine kinase activity in cells. Curr. Biol. 9, 1127–1130. an aspect of feedback signalling? Nat. Rev. Mol. Cell Biol. 4, 971–974. Turing, A.M. (1952). The Chemical Basis of Morphogenesis. Philos. Trans. R. Zidovetzki, R., Yarden, Y., Schlessinger, J., and Jovin, T.M. (1981). Rotational Soc. Lond. B Biol. Sci. 237, 37–72. diffusion of epidermal growth factor complexed to cell surface receptors reflects rapid microaggregation and endocytosis of occupied receptors. Uchida, T., Matozaki, T., Noguchi, T., Yamao, T., Horita, K., Suzuki, T., Fujioka, Proc. Natl. Acad. Sci. USA 78, 6981–6985. Y., Sakamoto, C., and Kasuga, M. (1994). Insulin stimulates the phosphorylation of Tyr538 and the catalytic activity of PTP1C, a protein tyrosine phospha- Zimmerberg, J., and Gawrisch, K. (2006). The physical chemistry of biological tase with Src homology-2 domains. J. Biol. Chem. 269, 12220–12228. membranes. Nat. Chem. Biol. 2, 564–567.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 909 Leading Edge Review

Measuring and Modeling Apoptosis in Single Cells

Sabrina L. Spencer1,2 and Peter K. Sorger1,* 1Center for Cell Decision Processes, Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA 2Present address: Department of Chemical and Systems Biology, Stanford University, Stanford, CA 94305, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.002

Cell death plays an essential role in the development of tissues and organisms, the etiology of disease, and the responses of cells to therapeutic drugs. Here we review progress made over the last decade in using mathematical models and quantitative, often single-cell, data to study apoptosis. We discuss the delay that follows exposure of cells to prodeath stimuli, control of mitochondrial outer membrane permeabilization, switch-like activation of effector caspases, and variability in the timing and probability of death from one cell to the next. Finally, we discuss challenges facing the ﬁelds of biochemical modeling and systems pharmacology.

Introduction receptors, primarily members of the tumor necrosis factor Apoptosis is a form of programmed cell death involving receptor (TNFR) family (Kaufmann and Earnshaw, 2000). caspases, specialized cysteine proteases found in animal cells Receptor binding by TNF family ligands activates caspase- as inactive proenzymes (Fuentes-Prior and Salvesen, 2004). dependent pathways that are quite well understood in molecular Dramatic progress has been made in recent years in identifying terms. In general, extrinsic apoptosis has received more atten- and determining the biochemical activities and cellular functions tion than intrinsic apoptosis from investigators seeking to of biomolecules that regulate apoptosis and carry out its proteo- develop mathematical models, but extrinsic and intrinsic lytic program. However, current knowledge is largely qualitative apoptosis share many components and regulatory mechanisms. and descriptive, and the complex circuits that integrate prosur- The best studied inducers of extrinsic apoptosis are TNF-a, vival and prodeath signals to control the fates of normal and Fas ligand (FasL, also known as Apo-1/CD95 ligand), and TRAIL diseased cells remain poorly understood. Successful creation (TNF-related apoptosis-inducing ligand, also known as Apo2L; of quantitative and predictive computational models of Figure 1A). Binding of these ligands to trimers of cognate recep- apoptosis would be significant from both basic research and tors causes a conformational change that promotes assembly of clinical perspectives. From the standpoint of basic research, death-inducing signaling complexes (DISCs) on receptor cyto- apoptosis is a stereotypical systems-level problem in which plasmic tails (Gonzalvez and Ashkenazi, 2010). DISCs contain complex circuits involving graded and competing molecular multiple adaptor proteins, such as TRADD and FADD, which signals determine binary life-death decisions at a single-cell recruit and promote the activation of initiator procaspases. The level. Progress in modeling such decisions has had a significant composition of the DISC differs from one type of death receptor impact on the small but growing field of mammalian systems to the next and also changes upon receptor internalization biology. From a clinical perspective, diseases such as cancer (Schutze et al., 2008). A remarkable feature of TNF-family recep- involve disruption of the normal balance between cell prolifera- tors is that they activate both proapoptotic and prosurvival tion and cell death, and anticancer drugs are thought to achieve signaling cascades and the extent of cell death is determined their therapeutic effects by inducing apoptosis in cancer cells in part by the balance between these competing signals. Pro- (Fadeel et al., 1999). However, it is difficult to anticipate whether death processes are triggered by activation of initiator procas- a tumor cell will or will not be sensitive to a proapoptotic stimulus pases-8 and -10 at the DISC, a process that can be modulated or drug based on general knowledge of apoptosis biochemistry by the catalytically inactive procaspase-8 homolog FLIP because the importance of specific processes varies dramati- (Fuentes-Prior and Salvesen, 2004). Prosurvival processes are cally from one cell type to the next. Predictive, multifactorial, generally ascribed to activation of the NF-kB transcription factor, and context-sensitive computational models relevant to disease but other less well-understood processes are also involved, such states will impact drug discovery and clinical care. as induction of the mitogen-activated protein kinase (MAPK) and Apoptosis can be triggered by intrinsic and extrinsic stimuli. In Akt (protein kinase B) cascades (Falschlehner et al., 2007). intrinsic apoptosis, the death-inducing stimulus involves cellular Initiator caspases recruited to the DISC directly cleave effector damage or malfunction brought about by stress, ultraviolet (UV) procaspases-3 and -7 generating active proteases (Fuentes- or ionizing radiation, oncogene activation, toxin exposure, etc. Prior and Salvesen, 2004). Effector caspases cleave essential (Kaufmann and Earnshaw, 2000). Extrinsic apoptosis is triggered structural proteins such as cytokeratins and nuclear lamins by binding of extracellular ligands to specific transmembrane and also inhibitor of caspase-activated DNase (iCAD), which

926 Cell 144, March 18, 2011 ª2011 Elsevier Inc. liberates the DNase (CAD) to digest chromosomal DNA and cause cell death. So-called ‘‘type I’’ apoptosis, which comprises a direct pathway of receptor/initiator caspases/effector caspases/death, is thought to be sufficient for death in certain cell types, but in most cell types apoptosis occurs by a ‘‘type II’’ pathway in which mitochondrial outer membrane permeabilization (MOMP) is a necessary precursor to effector caspase activation (Scaffidi et al., 1998). MOMP is triggered by the formation of pores in the mitochondrial membrane. Pore formation is controlled by the 20 members of the Bcl-2 protein family, which can be roughly divided into four types: the ‘‘effectors’’ Bax and Bak whose oligomerization creates pores; ‘‘inhibitors’’ of Bax and Bak association such as Bcl-2, Mcl1, and BclxL; ‘‘activators’’ of Bax and Bak such as Bid and Bim; and ‘‘sensitizers’’ such as Bad, Bik, and Noxa that antagonize antiapoptotic Bcl- 2-like proteins (Letai, 2008). In extrinsic apoptosis, initiator caspases that have been activated at the DISC cleave Bid into tBid, which in turn promotes a conformational change in Bax and Bak leading to oligomerization. Bax or Bak oligomers create pores in the mitochondrial outer membrane and promote cytoplasmic translocation of critical apoptosis regulators such as cytochrome c and Smac/Diablo, which normally reside in the space between the outer and inner mitochondrial membranes. MOMP does not occur until proapoptotic pore-forming proteins overwhelm antiapoptotic Bcl-2-like proteins (the so-called rheostat model) (Kors- meyer et al., 1993). Under most circumstances, MOMP is a sudden process that lasts a few minutes and marks the point of no return in the commitment to cell death (Chipuk et al., 2006; Tait et al., 2010). Once translocated to the cytosol, cytochrome c combines with Apaf-1 and caspase-9 to form the apoptosome, which cleaves and activates effector procaspases (Fuentes-Prior and Salvesen, 2004). XIAP associates with the catalytic pocket of active effector caspases-3 and -7 blocking protease activity and promoting their ubiquitin-dependent degradation. Binding of Smac to XIAP relieves this inhibition, allowing effector caspases to cleave their substrates and cause cell death (Fuentes-Prior and Salvesen, 2004). In this Review, we describe how combining theoretical and computational approaches with live-cell imaging and quantitative biochemical analysis has provided new insight into mechanisms controlling the dynamics of extrinsic apoptosis. We start with a brief description of modeling concepts and methods relevant to apoptosis research. Next, we survey the recent literature. Modeling apoptosis, like quantitative analysis of mammalian signal transduction in general, is a field in its infancy fraught with many technical and conceptual challenges. Thus, only a subset of the known biochemistry of extrinsic apoptosis has Figure 1. Modeling Receptor-Mediated Apoptosis been subjected to computational analysis, and this analysis (A) Simplified schematic of receptor-mediated apoptosis signaling, with has been performed only in a few cell lines. Key questions, fluorescent reporters for initiator caspases (IC FRET) and effector caspases (EC FRET) indicated. The MOMP reporter measures mitochondrial outer such as differences between normal and transformed cells, membrane permeablization. have not yet been addressed in terms amenable to modeling. (B) Steps involved in converting a biochemical cartoon into a reaction diagram This Review, therefore, focuses on the subset of questions for and ordinary differential equations. C8* indicates active caspase-8. Lower panels show a model-based 12 hr simulation of the increase in tBid which modeling has provided new insight (Figure 2). These relative to the time of MOMP and analysis of the sensitivity of MOMP time to include: (1) How is all-or-none control over effector caspase Bid levels. The simulation in (B) was adapted from Albeck et al. (2008b). activity achieved? (2) How are activated effector caspases inhibited during the pre-MOMP delay while initiator caspase activity rises? (3) How do prosurvival and prodeath signals interact to determine if and when MOMP occurs? (4) What

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 927 causes cell-to-cell variation in the timing and probability of tomata have been used to model the movement of molecules on apoptosis? We close this Review with an evaluation of current the mitochondrial outer membrane (Chen et al., 2007). When and emerging methods and future prospects. Readers inter- sufficient time-resolved quantitative data are lacking, a less ested in a more thorough description of the biology of extrinsic precise modeling framework is usually advantageous, and apoptosis are referred to several excellent reviews (Fuentes- logic-based models have proven particularly popular. Boolean Prior and Salvesen, 2004; Gonzalvez and Ashkenazi, 2010; models, for example, are discrete two-state logical models in Hengartner, 2000) and to Douglas Green’s new book Means to which each node in a network is represented as a simple on/ an End: Apoptosis and Other Cell Death Mechanisms (Green, off switch. Boolean models have been used to represent the 2011). interplay among survival, necrosis, and apoptosis pathways and to predict the likelihood that each phenotype would result Modeling Concepts Relevant for Apoptosis following changes in the levels of regulatory proteins (Calzone The term ‘‘model’’ is used in a variety of fields in the natural and et al., 2010). However, the more qualitative and phenomenolog- applied sciences to describe a mathematical or computational ical the modeling framework, the less mechanistic the insight. representation of a physical system. In molecular biology, the Regardless of modeling framework, a trade-off exists between term usually refers to a ‘‘word model’’ or narrative description model tractability and model detail or scope. The inclusion of accompanied by a diagram, although it can also refer to a cell more species makes it possible to analyze biochemical processes line or genetically engineered mouse that recapitulates aspects in greater detail or to represent the operation of large networks of a human disease. In this Review, we restrict use of the term involving many gene products, but larger models are more difficult ‘‘model’’ to describe an executable set of rules or equations in to constrain with experimental data, and excess detail can mask mathematical form. We are primarily interested in models that underlying regulatory mechanisms. A Jorge Luis Borges story are built and tested using detailed cellular or biochemical exper- comes to mind in which the art of cartography achieved such iments. Models of cellular biochemistry can be based on a perfection of detail that cartographers built a map of their empire different mathematical formalisms, from Boolean logic to differ- with 1:1 correspondence to the empire itself, rendering the map ential equations, depending on the degree of detail and the useless (Borges and Hurley, 1999). On the other hand, although scope of the modeling effort. Most models of apoptosis have small models have the advantage of relative simplicity and even been encoded using ordinary differential equations (ODEs), analytical tractability (i.e., capable of being solved exactly without which describe the evolution of a system in continuous time. simulation), they run the risk of grossly simplifying the underlying ODEs are the mathematical representation of mass action biochemistry and of including an insufficient number of regulatory kinetics, the familiar biochemical approximation in which rates processes. As yet, no clear principles exist to guide decisions of reaction are proportional to the concentrations of reactants about model scope and complexity, and most studies remain (Figure 1B) (Chen et al., 2010). Diffusion, spatial gradients, or constrained by the relative immaturity of modeling software and transport can be modeled explicitly using partial differential a paucity of experimental data. equations (PDEs), which represent biochemical systems in Estimating values for rate constants and initial protein concen- continuous time and space. For example, Rehm et al. (2009) trations (the parameters in differential equation models) remains used PDEs to model the spread of mitochondrial permeabiliza- extremely challenging both computationally and experimentally. tion through a cell following an initial, localized MOMP event. Each reaction in an ODE model is associated with one or more Using sets of differential equations it is possible to encode ‘‘initial conditions’’ (the concentrations of reactants at time a complex network of interacting biochemical reactions and zero) and rate constants, usually a forward and reverse rate then study network dynamics under the assumption that protein constant. Some of these parameters are available in the litera- concentrations and reaction rates can be estimated from exper- ture, typically from in vitro biochemical experiments, and these imental data. Differential equation models often increase rapidly values may hold true in the context of a cell. In many other cases, in complexity as species are added, as each new protein can however, no estimates of rate constants are available and give rise to a large number of model species differing in location, parameters must be estimated directly from experiments binding state, and degree of posttranslational modification. This (Chen et al., 2010). In addition, protein concentrations vary problem has effectively limited data-dependent ODE/PDE from cell type to cell type and should be measured directly in models to fewer than 20 gene products (and on the order of the cell type under investigation, although this is often not 50–100 model species), although efforts are underway to done because it is time consuming. The estimation of unknown increase this limit. parameter values based on data (typically, time-dependent In addition to differential equations, several other formalisms changes in the abundance or localization of proteins in the have been used to model apoptosis. Stochastic models make model) is called model calibration, model training, or model it possible to represent reactions as processes that are discrete fitting. Almost all realistic models of biological systems are too and random, rather than continuous and deterministic. large for all parameters to be fully constrained by experimental Stochastic models are advantageous when the number of indi- data, and the models are therefore ‘‘nonidentifiable.’’ Thus far, vidual reactants of any species is small (typically fewer than the process of model calibration has been approached rather 100) or reaction rates very slow (Zheng and Ross, 1991). In informally, but more rigorous approaches are in development these cases, a Monte Carlo procedure is used to represent the (e.g., Kim et al., 2010). Careful analysis is expected to confirm probabilistic nature of collisions and reactions among individual the common-sense view that solid conclusions can be reached molecules (Gillespie, 1977). For example, stochastic cellular au- even in the case of partial knowledge.

928 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 2. Questions Addressed in This Review (A and B) Composite plot of effector caspase substrate cleavage measured using a CFP- DEVDR-YFP reporter (A) or initiator caspase substrate cleavage measured using CFP-IETDG- GIETD-YFP (B) for >50 HeLa cells treated with 50 ng/ml TRAIL in the presence of cycloheximide and aligned by the average time of MOMP (red line). (C) Fitted trajectories for initiator caspase substrate cleavage (assayed using CFP-IETDG- GIETD-YFP) in single HeLa cells treated with 10 ng/ml TRAIL in the presence of cycloheximide (ﬁts are based on sampling at 3 min intervals). Concomitant expression of a reporter for MOMP permits a determination of the time at which mitochondria permeabilize and thus an estimation of the height of the MOMP threshold (yellow circles) and the rate of approach to the threshold (the ‘‘slope’’ of the green lines). (D) Histograms of time of death in HeLa cells treated with various death ligands in the presence of cycloheximide, as determined by live-cell microscopy. (A), (B), and (D) were adapted from Albeck et al. (2008b); (C) was adapted from Spencer et al. (2009).

Modeling biological processes requires the collection and computational equivalent of experiments that knock down or analysis of quantitative experimental data. An ODE model, which overexpress proteins while monitoring phenotype. For example, assumes that each compartment is well mixed, necessarily Hua et al. (2005) created an ODE model of Fas signaling and per- represents a single cell, and calibrating and testing ODE models formed sensitivity analysis by varying the initial concentration of therefore require collecting data on single cells over time. each protein species 10- or 100-fold above or below a baseline However, live-cell imaging experiments usually rely on geneti- value. Using the half-time of caspase-3 activation as an output, cally modified cell lines carrying fluorescent reporters. Creating they predicted (and confirmed experimentally) that increases these lines is relatively time-consuming, and the extent of multi- but not decreases in Bcl-2 levels would alter sensitivity to plexing is limited by phototoxicity and the availability of noninter- FasL. From a practical perspective, sensitive parameters must fering fluorophores. It is not always clear that an engineered be estimated with particular care if a model is to be reliable, reporter correctly represents the activity or state of modification but from a biological perspective, they represent possible means of endogenous proteins (see, for example, discrepancies of regulation. Points in a network that exhibit extreme sensitivity regarding initiator caspase activity reporters, discussed below; to small perturbations are often referred to as ‘‘fragile’’ (the Albeck et al., 2008a; Hellwig et al., 2008; Hellwig et al., 2010). converse of ‘‘robust’’), and considerable interest exists in the Flow cytometry, immunofluorescence, and single-cell PCR are idea that fragility analysis, a concept borrowed from control also effective means to assay single cells, and biochemical theory, might be applied to biological pathways. In this view, experiments (immunoblotting or ELISAs for example) performed fragile points might identify processes frequently mutated in on populations of cells remain essential for quantitative biology. disease or potentially modifiable using therapeutic drugs (Luan Although rarely addressed, effective integration of data arising et al., 2007). from multiple measurement methods is an area in which compu- Stability analysis is another commonly used method of model tational models are likely to play a key role (Albeck et al., 2006). analysis. Some models of biochemical networks have the inter- The construction and parameterization of even a well esting property of converging at equilibrium to a small set of designed model do not lead directly to a better understanding stable states known as fixed points, where the rate of change of the system—model analysis is required. The dependence of in the concentrations of all model species is zero. Identification the system on parameter values is of particular interest and and characterization of fixed points can provide valuable insight can be approached using sensitivity analysis. Sensitivity analysis into the dynamics of a system, its responses to perturbation, and involves systematically varying parameters (initial conditions or the nature of regulatory mechanisms. Of particular interest in rate constants) while monitoring the consequences for model biology is bistability, a property in which a system of equations output (the time at which a cell undergoes apoptosis, for has two stable fixed points separated by an unstable fixed point. example). Sensitivity analysis reveals which outputs are sensitive Bistability has obvious appeal in the case of apoptosis, in which to variation in which parameters and can be viewed as the cells are either alive or dead, and has been proposed to underlie

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 929 a variety of binary fate decisions such as maturation of Xenopus sudden and switch like in individual cells, but that it takes place oocytes (Ferrell and Machleder, 1998) and lactose utilization in at different times in different cells (Figure 3B) (Goldstein et al., E. coli (Ozbudak et al., 2004). From the perspective of control, 2000a; Goldstein et al., 2000b; Tyas et al., 2000). many bistable systems have two valuable properties: (1) they All-or-None Control over Effector Caspase Activity are insensitive to minor perturbations because the system is ‘‘at- Goldstein et al. (2000b) used time-lapse imaging of cytochrome tracted’’ to the nearest stable state (in apoptosis, a bistable c translocation to obtain the first data on the kinetics of MOMP. system would be resistant to spontaneous activation of proa- They observed the time between proapoptotic insult and MOMP poptotic proteins, for example), and (2) they exhibit ‘‘all-or- to vary depending on the type and strength of the stimulus none’’ transitions from one stable state to another in response (ranging from 4–20 hr following exposure to the pan-specific to small changes in the level of a key regulatory input (a property kinase inhibitor staurosporine and 9–17 hr following exposure known in biochemistry as ‘‘ultrasensitivity’’). Bistable processes to UV light), but the rate and extent of cytochrome c release often exhibit hysteresis (path dependence): once in the on state, were constant, taking 5 min to reach completion. Further they do not readily slip back to off. It is often assumed that the understanding of the link between MOMP and caspase activa- regulatory machinery for apoptosis must be bistable in the math- tion was made possible by the development of intramolecular ematical sense with one equilibrium state corresponding to cas- Fo¨ rster resonance energy transfer (FRET) reporters for caspases off and ‘‘alive’’ and the other to caspases on and ‘‘dead’’ pase-mediated proteolysis. The first FRET reporters for moni- (Figure 3A). Although bistability remains the favorite framework toring caspase activity by time-lapse microscopy linked cyan for thinking about the switch between life and death, bistability fluorescent protein (CFP) to yellow fluorescent protein (YFP) is not strictly necessary for a switch-like transition between using a polypeptide linker containing the amino acid sequence two distinct states (Albeck et al., 2008b). A monostable system DEVD, a substrate for caspase-3 (CFP-DEVD-YFP) (Rehm in which the landscape changes through time can create et al., 2002; Tyas et al., 2000). Prior to reporter cleavage, CFP a temporal switch between two states; in this case, the change lies in close proximity to YFP, causing FRET between the two in the landscape involves the creation, destruction, or transloca- fluorescent proteins and reducing CFP emission. Following tion of precisely those proteins (caspases, cytochrome c, etc.) cleavage of the DEVD-containing linker, the efficiency of FRET that are known to regulate apoptosis. In this regard, it should drops dramatically, increasing the CFP to YFP fluorescence be noted that the ‘‘sharpness’’ of a switch in a conventional bi- ratio. Time-lapse imaging of cells expressing CFP-DEVD-YFP stable system refers to the steepness of the dose-response revealed that caspase-3 is also activated rapidly, taking curve (to a change in the concentration of a regulatory protein, <15 min to reach completion (Rehm et al., 2002; Tyas et al., for example), not necessarily sharpness in time. In contrast, 2000). the ‘‘all-or-nothing’’ switch observed by time-lapse microscopy Rehm et al. (2002) asked whether the cleavage kinetics of of cells undergoing apoptosis refers to a switch from alive to effector caspase substrates depended on the identity or dead that is sharp in a temporal sense. These considerations strength of the apoptotic stimulus. Like Goldstein et al. do not imply that the biochemical pathways controlling (2000b), Rehm and colleagues observed the delay between apoptosis are not bistable systems, but rather that bistability is exposure to a prodeath stimulus and the onset of effector cas- not necessary a priori. pase activation to vary from cell to cell. They also noted that the average delay varied with the dose and identity of the proa- Modeling and Measuring Receptor-Mediated Apoptosis poptotic stimulus (3 ng/ml or 200 ng/ml TNF, 3 mM staurosporine, The first model of extrinsic apoptosis was published a decade 10 mM etoposide), but the kinetics of reporter cleavage did not. ago and set the stage for subsequent work in the field. Fusse- The authors developed a quantitative description of these negger et al. (2000) used emerging understanding of MOMP data, showing that caspase activation in individual cells fits and caspase activation by death receptors to assemble a simple a sigmoidal Boltzmann equation in which the lag time is dose ODE model. By increasing or decreasing the levels of pairs of and stimulus dependent, but cleavage kinetics are dose invariant proteins in the model, the authors determined which combina- (Figures 3C and 3D) (Rehm et al., 2002). Subsequent multiplex tions promoted or blocked effector caspase activation, thereby imaging of MOMP and effector caspase reporters in single cells providing insight into ratiometric control over cell death by cas- showed that MOMP precedes effector caspase activation by pase-3 and XIAP (Fussenegger et al., 2000). At the same time, 10 min (Rehm et al., 2003). In electromechanical terminology, the development of fluorescent reporters for MOMP and cas- the regulation of MOMP and effector caspase activity constitutes pase substrate cleavage allowed several groups to collect data a variable delay, snap-action switch. on the dynamics of apoptosis in single cells. These data showed Intrigued by the idea that a switch is central to the regulation of that following exposure to inducers of either intrinsic or extrinsic apoptosis, several groups have attempted to understand how apoptosis (UV light, actinomycin D, staurosporine, or TNF), cells such a switch might arise, based on models in which bistability wait for several hours before initiating a rapid chain of events that is assumed as a design principle. Eissing et al. (2004) created triggers MOMP and activates effector caspases (Goldstein et al., an 8-equation ODE model of apoptosis in a type I cell that 2000a, 2000b; Tyas et al., 2000). This contrasts with data ob- included activation of caspase-8, consequent cleavage and tained by western blotting and other population-average activation of caspase-3, inhibition and degradation of activated biochemical assays that suggested that MOMP and caspase caspase-3 by XIAP, and activation of residual caspase-8 by acti- activation occur gradually over a period of several hours. The vated caspase-3 in a feedback loop. The small size of the model two types of data can be reconciled by noting that apoptosis is made it possible to identify stable states analytically, and the

930 Cell 144, March 18, 2011 ª2011 Elsevier Inc. authors found that adding a mechanism to inhibit active cas- to capture the essence of TRAIL-receptor binding, cleavage of pase-8 (via the protein Bar) was necessary to ensure bistability initiator and effector caspases, initiation of MOMP, release of at the level of effector caspase activity (Eissing et al., 2004). Smac and cytochrome c into the cytoplasm, and finally cas- A subsequent modeling study that examined how cells would pase-3 activation and substrate cleavage. The model was resist spontaneous procaspase-8 activation argued against trained against experimental data that included live-cell micros- a major role for Bar, however (Wurstle et al., 2010). Legewie copy, immunoblotting, and flow cytometry in wild-type HeLa et al. (2006) created a 13-ODE model that described activation cells or cells perturbed by protein overexpression or RNAi-medi- of caspase-9 by Apaf-1, consequent activation of caspase-3, ated protein depletion. Model analysis and experiments and inhibition of caspases by XIAP. The authors identified an confirmed earlier evidence that MOMP is the point in receptor- ‘‘implicit’’ or hidden positive feedback loop as a key contributor mediated apoptosis at which upstream signals are transformed to bistability; in this loop, caspase-3 promotes its own activation into an all-or-none snap-action signal (Goldstein et al., 2005, by sequestering XIAP away from caspase-9, allowing caspase-9 2000b; Madesh et al., 2002; Rehm et al., 2003; von Ahsen to cleave additional procaspase-3 (Legewie et al., 2006). Bagci et al., 2000). To understand how this switch might work in molec- et al. (2006) built models of varying complexity (the largest being ular terms, Albeck et al. (2008b) analyzed a series of models of 31 ODEs) centered on apoptosome formation and caspase-3 increasing complexity and biochemical realism that linked tBid activation and concluded that cooperativity in the formation of cleavage by initiator caspases to Smac/cytochrome c release. the apoptosome was a key element for ensuring bistability (Bagci The performance of each model was analyzed for its ability to et al., 2006). Chen et al. (2007) constructed both an ODE model create a variable-delay, snap-action switch. A useful insight and a stochastic cellular automaton model to examine the poten- was that the ‘‘rheostat model’’ (Korsmeyer et al., 1993), in which tial for interactions among Bcl-2-family members to generate it was postulated that MOMP is triggered when levels of active bistability at MOMP. These models included activation of Bax Bax/Bak exceed those of Bcl-2/BclxL, only functioned in its by an activator such as tBid, inhibition of the activator and Bax simplest form as a switch if Bax and Bcl-2 were assumed to by Bcl-2, and displacement of the activator in the activator- associate irreversibly at a rate faster than diffusion. In contrast, Bcl-2 complex by Bax. This description of Bax and Bcl-2 also snap-action switching emerged naturally from the biochemistry encoded an implicit positive feedback loop in which freed acti- of Bax and Bcl-2 if more complex reaction topologies were vator could bind more Bax, leading to bistability in pore forma- assumed; these included slow activation of Bax by tBid, parti- tion. Addition of cooperativity in Bax multimerization resulted in tioning of reactants into cytosolic and mitochondrial compart- a model encoding a one-way (as opposed to bidirectional or ments, and a requirement for Bax multimerization. Rapid and hysteretic) switch (Chen et al., 2007). In a corroborating study complete translocation of Smac and cytochrome c was ensured that used flow cytometry, an antibody against activated Bax in part by the favorable kinetics of moving proteins down a steep (clone 6A7) revealed a bimodal distribution in the staining of concentration gradient from the mitochondrion (where they are HeLa cells treated with 400 nM staurosporine for 6 hr (Sun abundant) to the cytosol (where they are initially absent). Despite et al., 2010). However, antibody staining is unreliable in dying/ the apparent success of the Albeck et al. (2008b) model, it is dead cells, so proving the point will require showing bimodality important to realize that it involves a simple picture of DISC in Bax activation in cells that have not yet undergone effector formation as well as a simplified version of MOMP that lacks caspase activation. Most recently, Ho and Harrington (2010) built the multiplicity of positive- and negative-acting Bcl-2-family a small ODE model in which FasL acts as a clustering agent for members present in real cells. Fas receptors. The reactions described spontaneous receptor Inhibition of Effector Caspases during opening and closing, constitutive destabilization of open recep- the Pre-MOMP Delay tors, and ligand-independent and -dependent stabilization of Better understanding of caspase substrate specificity (Luo et al., receptor clusters. Analytical methods showed the system to 2003; Stennicke et al., 2000; Thornberry et al., 1997) along with exhibit reversible bistability (hysteresis) at low receptor concen- a direct comparison of CFP-DEVD-YFP cleavage kinetics with trations but irreversible bistability at higher local receptor densi- those of endogenous substrates (Albeck et al., 2008a) made ties (Ho and Harrington, 2010). In summary, this set of papers clear that the CFP-DEVD-YFP biosensor is processed by both reveals that almost every point in the apoptosis pathway has effector and initiator caspases. Changing the biosensor linker the potential to generate bistability in the mathematical sense. to DEVDR made it 20-fold more selective for caspase-3 relative However, many of the papers were written in an era in which it to caspase-8 (Albeck et al., 2008a), and changing the cleavage was not yet common for mathematical modeling to be combined recognition site to IETD resulted in a FRET reporter selective with quantitative experimentation in a single manuscript. The for initiator caspases (Luo et al., 2003). Combining this selective results of simulation were compared to data from the literature, effector caspase reporter with a MOMP reporter showed that but proposed regulatory mechanisms were not confirmed using effector caspase activity is negligible during the pre-MOMP RNAi or other perturbation-based experiments. delay (Albeck et al., 2008a); this had correctly been assumed Whereas the first generation of apoptosis models focused on to be true by Rehm et al. (2002), despite the use of a less specific specific steps in the process of cell death (MOMP, apoptosome CFP-DEVD-YFP reporter. In contrast, initiator caspases are formation, etc.), Albeck et al. (2008b) built a model that spanned active throughout the pre-MOMP delay (Albeck et al., 2008a; the entire pathway of extrinsic apoptosis from ligand binding to Hellwig et al., 2008), and their substrates Bid and procaspase- cleavage of effector caspase substrates, albeit in simplified 3 accumulate in cleaved form. Caspase-3 is a very potent form. A model comprising 58 differential equations was sufficient enzyme, and model-based simulation and experiments suggest

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 931 Figure 3. Using Models to Understand Data (A) Energy landscape showing frameworks for achieving two distinct states. Left: A bistable system has two stable steady states for all time (once equilibrium is reached), corresponding to alive and dead. Right: A monostable system starts with a single stable ‘‘alive’’ state; once the model starts to evolve, the landscape morphs as proteins are created and destroyed, producing a single stable ‘‘dead’’ state at late times. Making the transition unidirectional requires processes such as a threshold. (B) Top: Immunoblot analysis of PARP cleavage in HeLa cells treated with 10 ng/ml TRAIL in the presence of cycloheximide; PARP is an effector caspase substrate. Bottom: Simulation of the time course of effector caspase (EC) substrate cleavage in individual cells (blue lines), overlaid with an average (pink line) that depicts the fraction of cells in which caspases have been activated; this average mimics the data obtained by immunoblotting. (C) Idealized single-cell time course for effector caspase substrate cleavage. The dynamics have the form of a sigmoidal Boltzmann equation in which c(t) is the amount of substrate cleaved at time t, f is the fraction cleaved at the end of the reaction, Td is the delay period between TRAIL addition and half-maximal substrate cleavage, and Ts is the switching time between initial and complete effector caspase substrate cleavage (the reciprocal of the slope at t =Td). (D) Effector caspase substrate cleavage in individual HeLa D98 cells expressing myc-CFP- DEVD-YFP in response to the indicated doses of TNF. Data from each cell have been ﬁt with the sigmoidal Boltzmann function. (E) Simulation showing effector caspase substrate cleavage as a function of XIAP concentration. At high concentrations, effector caspase substrate cleavage is blocked; at low concentrations, effector caspases are activated rapidly; and at concentrations of XIAP between 0.15 and 0.30 mM, effector caspase substrate cleavage proceeds slowly and only reaches submaximal levels. (F) A simulation showing how the initial concentrations of procaspase-8 and cFLIPL determine whether NF-kB is activated, effector caspases are activated, or both after Fas stimulation. The white circle indicates the estimated level of procaspase-8 and cFLIPL in HeLa-CD95 cells. (A) and (C) were adapted from Albeck et al. (2008b); (D) was adapted with permission from Rehm et al. (2002), J. Biol. Chem. 277, 24506–24514, copyright 2002 The American Society for Biochemistry and Molecular Biology. All rights reserved; (E) was adapted from Rehm et al. (2006) by permission from Macmillan Publishers Ltd: EMBO J. 25, 4338–4349, copyright 2006. (F) was adapted from Neumann et al. (2010) by permission from Macmillan Publishers Ltd: Mol. Syst. Biol. 6, 352, copyright 2010.

that 400 active molecules are sufficient to cleave 106–107 inhibition of caspase-3 proteolytic activity over the course of molecules of cellular substrate within several hours (Albeck a typical 2–6 hr pre-MOMP delay. The requirement for such et al., 2008a). However, during the pre-MOMP delay, no pro- a large excess of XIAP over caspase-3 arises because competi- cessing of effector caspase substrates can be detected using tive inhibition is reversible whereas substrate cleavage is not and live-cell FRET reporters or flow cytometry (c.f. Figures 2A and because substrates, which are abundant, are in competition with 2B). This raises the interesting question: How are effector cas- XIAP for access to the caspase catalytic site. As XIAP and caspases maintained in an off state despite being continually pro- pase-3 are present at roughly equal concentrations in HeLa cells, cessed by initiator caspases from a zymogen into a cleaved simple competitive inhibition cannot be the sole inhibitory mech- and potentially active form? anism. XIAP is an E3 ligase able to promote ubiquitination and One mechanism for keeping processed effector caspases degradation of caspase-3, and simulation suggests that a combi- ‘‘off’’ is binding of XIAP to the catalytic cleft of caspase-3. This nation of competitive inhibition and caspase degradation would tight interaction (1 nM) might seem sufficient to hold caspase- constitute an effective means of regulation (Albeck et al., 2008a). 3 in check, but modeling shows that a >100-fold molar excess Confirming these predictions, depletion of XIAP by RNAi or phar- of XIAP over caspase-3 would be required to ensure effective macological inhibition of the proteasome was observed to cause

932 Cell 144, March 18, 2011 ª2011 Elsevier Inc. effector caspase activation prior to MOMP (Albeck et al., 2008a). type I cells has been observed to reduce effector caspase Deletion of XIAP in the mouse or truncation of the ubiquitination- activity and to increase the number of cells surviving TRAIL promoting RING domain also caused elevated caspase-3 activity exposure (Maas et al., 2010). Both type I and type II pathways, and sensitivity to apoptosis (Schile et al., 2008), demonstrating therefore, appear to depend to a greater or lesser extent on a critical role for XIAP-mediated ubiquitination of caspase-3 the mitochondrial pathway, either for regulating XIAP and acti- in vivo. The pre-MOMP delay evidently constitutes a ‘‘latent’’ vating effector caspases or for killing cells by disrupting essential death state in which effector procaspases are actively processed mitochondrial functions. by initiators but are held in check by XIAP until Smac is released Determinants of the Timing and Probability of MOMP during MOMP. The reasoning that led to this conclusion illus- Apoptosis proceeds at different rates in different cells, even trates the value of making models explicit and analyzing them among members of a clonal population. Some cells die within computationally: a biochemical mechanism that seems 45 min of exposure to FasL or TRAIL, whereas other cells in adequate on its face (Bcl-2-Bax binding in the rheostat model the same dish wait 12 hr or more. A simple way to conceptualize or competitive inhibition of C3 by XIAP during the pre-MOMP control over the timing of apoptosis in single cells is that the level delay) proves insufficient when actual protein levels and rates of active receptor determines the amount of active caspase-8/ of reaction are taken into account. In this sense, quantitative anal- 10, which sets the rate of tBid cleavage and, thus, the rate of ysis can fundamentally change our qualitative understanding of approach to a threshold that must be overcome for MOMP to a regulatory mechanism. It should be noted, however, that occur (Figure 2C). The height of this threshold is set by the rela- current models of receptor-mediated apoptosis in type II cells tive levels of competing pro- and antiapoptotic Bcl-2-family cannot completely restrain pre-MOMP caspase-3 activity when proteins (Chipuk and Green, 2008). We discuss below recent experimentally measured procaspase-3 and XIAP concentra- advances in our understanding of the MOMP threshold and re- tions are used. Although XIAP-mediated degradation of active turn later to the determinants of the rate of approach to the caspase-3 is necessary, raising this degradation rate too much threshold. Using fluorescent measurements in a purified in vitro compromises the switch-like activation of effector caspase system, Lovell et al. (2008) simultaneously measured the rates substrate cleavage post-MOMP. Reconciliation of all experi- of three reactions leading to pore formation and determined mental observations awaits the development of more sophisti- the following order of events. First tBid binds rapidly to mito- cated and complete models. chondrial membranes where tBid and Bax interact, promoting If XIAP is partially depleted by RNAi and MOMP is blocked by insertion of Bax into the membrane, a rate-limiting step. Bax overexpression of Bcl-2, a sublethal level of effector caspase then oligomerizes to form pores, and membranes become activity is generated and effector caspase substrates are only permeable. In vitro, Bax oligomerization continues even after partially processed; moreover incomplete cleavage of cas- membranes are permeabilized (Lovell et al., 2008). In cell culture, pase-3 substrates does not necessarily cause cell death (at least Bax multimerization is first detected immediately prior to MOMP in HeLa cells) (Albeck et al., 2008a). Modeling and experiments and then continues for at least 30 min, ultimately generating with XIAP overexpression suggest three possible outcomes de- many more Bax puncta or pores than the number required for pending on XIAP levels: with [XIAP] < 0.15 mM, effector caspase MOMP (Albeck et al., 2008b; Dussmann et al., 2010). Formation substrate cleavage is complete; at [XIAP] > 0.30 mM, cleavage is of the first observable Bax (or Bak) puncta correlates temporally fully inhibited; and at intermediate XIAP concentrations, slow and spatially with the first subset of mitochondria to undergo submaximal effector caspase substrate cleavage occurs MOMP. Pore formation and MOMP then spread through the (Figure 3E) (Rehm et al., 2006). Thus, alteration of XIAP levels cell as a wave with a velocity of 0.6 mm/s, a process that has disrupts normal switch-like control over effector caspase activa- been modeled using a PDE network (Rehm et al., 2009). The tion and interferes with the normal link between caspase activa- process of pore formation proceeds more rapidly at higher doses tion and cell killing. Activation of CAD in the absence of cell death of TRAIL, presumably due to an increased rate of procaspase-8 is expected to be particularly problematic since it has the poten- activation (Rehm et al., 2009). However, it has recently been tial to cause genomic instability (Lovric and Hawkins, 2010) and observed that in a subset of HeLa cells, MCF-7 cells, and murine has been proposed to be the trigger of the chromosomal trans- embryonic fibroblasts, some mitochondria fail to undergo locations observed in some leukemias (Betti et al., 2005; MOMP in response to diverse proapoptotic stimuli (actinomycin Vaughan et al., 2002, 2005; Villalobos et al., 2006). D, UV, staurosporine, or TNF). The subset of mitochondria that The role of XIAP in restraining caspase-3 in the absence of remain intact fail to accumulate GFP-Bax puncta but undergo MOMP makes it a central factor in controlling type I versus complete MOMP when treated with the Bcl-2 antagonist and type II apoptosis. Jost et al. (2009) observed that inhibition of investigational therapeutic ABT-737, suggesting that resistance XIAP function by gene targeting or a Smac mimetic drug caused of mitochondria to MOMP lies at the point of Bax/Bak activation type II cells to adopt a type I phenotype. Bid deficiency protected (Tait et al., 2010). These findings suggest that mitochondria in hepatocytes and pancreatic b cells from FasL-induced a single cell differ from each other with respect to their sensitiv- apoptosis (fulfilling the definition of mitochondria-dependent ities to proapoptotic stimuli and that MOMP might not always be type II death), but concomitant loss of XIAP (in BidÀ/À XIAPÀ/À an all-or-none event at the single-cell level (Tait et al., 2010). mice) restored FasL sensitivity, thereby demonstrating a switch Time-lapse imaging of initiator caspase and MOMP reporters to type I behavior (Jost et al., 2009). Type I cells are defined as shows that the height of the MOMP threshold varies from cell to not requiring MOMP for apoptosis, but blockade of the mito- cell. MOMP is triggered following cleavage of 10% of a reporter chondrial pathway via Bid depletion or Bcl-2 overexpression in carrying one IETD recognition site (Hellwig et al., 2008, 2010)or

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 933 30%–60% of a reporter carrying two recognition sites (Albeck Model analysis suggested that the effects of c-FLIPL on activa- et al., 2008a, 2008b; Spencer et al., 2009). Variation in the height tion of procaspase-8 vary with FasL levels: relative to cells having of the MOMP threshold from cell to cell (presumably arising from endogenous levels of c-FLIPL, 20-fold overexpression of c-FLIPL variation in the levels of Bcl-2-family proteins, see below) can blocks cell death when FasL levels are low but accelerates death most easily be resolved using the sensitized dual-IETD reporter when FasL levels are high. However, even at high FasL levels,

(Figure 2C) and contributes 20% of the total variability in the a further increase in c-FLIPL concentration inhibits procaspase- time of death among HeLa cells in clonal population exposed 8 processing and decreases the extent of cell death. Models to 10 ng/ml TRAIL (Spencer et al., 2009). The remaining 80% can be quite helpful in exploring these sorts of quantitative rela- of the variability appears to reflect differences in the rate of Bid tionships. One explanation supported by model analysis involves cleavage, although these percentages are expected to change the fact that c-FLIPL has higher affinity for DISCs than procas- with stimulus and cell type. However, the precise dynamics of pase-8 but that procaspase-8 is present in cells in substantial Bid cleavage have recently been thrown into some doubt: molar excess. At low FasL levels, few DISCs are formed, relative a FRET reporter containing full-length Bid rather than an artificial affinities dominate, and the ratio of c-FLIPL to caspase-8 at IETD caspase recognition site exhibits minimal cleavage prior to DISCs is high. At high levels of FasL, many DISCs are formed,

MOMP (Hellwig et al., 2010). One explanation for this discrep- and the small number of c-FLIPL molecules (300 per HeLa ancy is that IETD-only reporters might be overly sensitive and cell) is exhausted, allowing DISC-bound procaspase-8 to over- not reflect the kinetics of endogenous substrate cleavage. In whelm c-FLIPL. Thus, subtle changes in the levels of c-FLIPL this view, cleavage of Bid by initiator caspases is subject to addi- and FasL can change the timing and probability of death in tional forms of regulation so that tBid does not accumulate until nonlinear ways that can be understood only if the concentrations just before MOMP (Hellwig et al., 2010). Conversely, the Bid- of interacting proteins are taken into account (Fricker et al., 2010). containing FRET reporter might simply be insufficiently sensitive, An additional factor affecting the life-or-death fate of a cell and levels of tBid required for MOMP (estimated to be <3% of exposed to death ligand is the interplay between prosurvival the total Bid pool) might be below the level of detection. In this and proapoptotic pathways. The relative strength of these view, IETD-only reporters conveniently amplify a signal that competing regulatory processes is also thought to be would otherwise be undetectable. Resolving this question will controlled by the composition of the DISC. Induced survival require careful comparison of reporter constructs with endoge- signaling has been largely attributed to NF-kB, and many nega- nous proteins, which will itself depend on the availability of anti- tive regulators of apoptosis are known to be induced by NF-kB, bodies that can distinguish different caspase-8 substrates. It including c-FLIP, BclxL, and members of the IAP family seems likely that carefully calibrated models will also help with (Gonzalvez and Ashkenazi, 2010). It is not yet clear which of data integration. these factors is most important nor whether NF-kB-indepen- The rate at which initiator caspase substrates are cleaved dent processes, such as MAPK signaling, also play important varies from cell to cell (Figure 2C). The current view is that the prosurvival functions. Whereas Bentele et al. (2004) and Fricker strength of receptor signaling and the amount of active DISC et al. (2010) focused on the presence or absence of prodeath control the rate of initiator caspase substrate cleavage and signaling at the DISC, Lavrik et al. (2007) and Neumann et al. thus the rate of approach to the MOMP threshold, with lower (2010) explicitly focused on the balance between prodeath levels of prodeath stimulus leading to slower Bid cleavage and versus prosurvival activities. Lavrik et al. (2007) demonstrated slower onset of apoptosis. Models of DISC formation in FasL- that Erk kinase is activated in response to anti-APO-1 over treated cells have questioned whether apoptosis simply slows a wide range of doses, even in the presence of a pan-caspase down with decreasing ligand concentrations (a continuous inhibitor. This work showed that survival signaling occurred in decrease), or whether there is a minimum ligand:receptor ratio parallel with death signaling, but it was not clear how the needed for induction of apoptosis (a threshold; Bentele et al., survival signal was initiated. Neumann et al. (2010) built and 2004). Modeling predicted that below a critical ligand:receptor tested an ODE model of Fas-mediated apoptosis with a postu- ratio of 1:100, apoptosis is completely blocked due to the lated link between apoptosis and survival pathways in which presence of the inhibitory DISC component c-FLIP. Above the c-FLIPL is cleaved by caspase-8 into p43-FLIP, which binds critical threshold, c-FLIP is insufficient to block all DISC activity and activates IkB kinase (IKK). IKK then phosphorylates and prior to the formation of active caspase-8. A follow-up study inhibits IkB, a negative regulator of NF-kB, leading to induction refined this view by showing that active DISC is formed of NF-kB-mediated transcription. Simulation and experiments at concentrations of a receptor crosslinking antibody (anti- suggested that both proapoptotic (caspase-8 dependent) and APO-1, which activates Fas receptors) below a critical threshold. prosurvival (NF-kB-dependent) pathways are activated in

However, because c-FLIP has a higher afﬁnity than procaspase- parallel and that a subtle balance between c-FLIPL and initiator 8 to the few DISCs that are formed, activation of caspase-8 is caspase levels determines which one predominates. By visual- effectively inhibited (Lavrik et al., 2007). Continuing this line of izing the levels of c-FLIPL and procaspase-8 on a parameter reasoning, Fricker et al. (2010) used modeling, biochemical landscape (Figure 3F), the authors showed that c-FLIPL can assays, and live-cell imaging to explore how levels of c-FLIP iso- disable, promote, or inhibit NF-kB activation depending on forms determine sensitivity to Fas signaling. Although c-FLIPS/R whether the level of c-FLIPL is low, intermediate, or high. This is well established as an inhibitor of Fas-mediated apoptosis, effect arises because high levels of c-FLIPL prevent caspase- the role of c-FLIPL has been controversial because it plays both 8-mediated processing of c-FLIPL into the IKK-binding p43- pro- and antiapoptotic roles depending on expression level. FLIP form (Neumann et al., 2010).

934 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Future work on the topic of induced survival signaling would ation complexes on any single gene is small (potentially as small benefit from single-cell measurements combining reporters for as 1–2), and the probability that a transcript will be created in any NF-kB target gene expression (Nelson et al., 2004) and reporters time interval is therefore highly stochastic. Fluctuations in mRNA of initiator caspase activity so as to capture feedback. Both the levels result in fluctuating rates of protein synthesis. With short- apoptosis and the NF-kB fields have a tradition of utilizing live- lived or low-copy-number proteins, this can cause large fluctua- cell imaging and mathematical modeling (e.g., Ashall et al., tions in protein levels, whereas with relatively abundant proteins, 2009; Hoffmann et al., 2002; Lee et al., 2009), and it would be such as those controlling apoptosis, the most significant effect is valuable to combine models of both processes. This would that different cells contain different concentrations of each lead to better understanding of competing prosurvival and pro- protein, and thus unique proteomes. Current models predict death death processes in different cell types. that the distribution of concentrations across a population of Cell-to-Cell Variation in the Timing of Apoptosis cells should be long-tailed, following a log-normal or gamma Individual cells differ widely in their responses to apoptotic distribution (Friedman et al., 2006; Krishna et al., 2005). In the stimuli (Figure 2D). Potential sources of cell-to-cell variability in case of proteins controlling apoptosis, flow cytometry reveals the timing and probability of apoptosis include genetic or epige- a nearly log-normal distribution with a coefficient of variation netic differences, differences in cell-cycle phase, stochastic fluc- (CV; a unit-less measure of variability equal to the standard devi- tuations in biochemical reactions, and natural variation in protein ation divided by the mean) ranging from 0.2 to 0.3 (Spencer et al., concentrations. To distinguish among these possibilities, three 2009). Such a spread results in cells in the top 5th percentile independent groups followed dividing cells using time-lapse having >2.53 higher protein expression compared to cells in microscopy and compared the timing and probability of the bottom 5th percentile (Niepel et al., 2009). The question apoptosis in sister cells and in randomly selected pairs of cells then arises of whether such modest variation in protein levels (Bhola and Simon, 2009; Rehm et al., 2009; Spencer et al., is sufficient to explain the observed variation in the timing of 2009). At a dose of TRAIL sufficient to induce apoptosis in half cell death. Model-based simulation suggested that it is: when of the cells, the probability of death was observed to be highly the distribution of cell death times was computed for TRAIL- correlated between sisters, as was the time at which cells died. induced apoptosis assuming log-normally distributed protein Correlation in death time among sister cells has been observed concentrations, a close match was observed between the vari- in a variety of cell types (HeLa, MCF-10A, NIH 3T3, HT1080, ability in simulation and experiment (Spencer et al., 2009). In and murine embryonic fibroblasts) following exposure to a variety the absence of any simple experimental test, the match between of apoptosis-inducing agents (TRAIL, TNF-a, staurosporine, and simulation and measurement increases our confidence in the etoposide). In contrast, randomly selected cells were found to be hypothesis that natural variation in the levels of apoptotic regula- uncorrelated, and no obvious correlation with cell-cycle phase or tors is responsible for variability in the time and probability of cell with position in the dish could be detected (Bhola and Simon, death. 2009; Spencer et al., 2009), although the way in which these Is it possible to establish a direct link between the levels of any experiments were performed does not rule out some contribu- single protein and the probability and timing of apoptosis? In tion from cell-cycle state (Rehm et al., 2009). Importantly, the principle such a measurement could be made by fluorescently degree of similarity between sisters fell as the time since cell divi- tagging proteins of interest at the endogenous locus and then sion increased so that within one to two generations, sisters were relating their levels to time of death using live-cell microscopy. no more correlated than randomly chosen pairs of cells. This However, mathematical modeling suggests that achieving transient heritability in timing of death argues against a genetic reasonable predictability over cell fate would require single-cell or epigenetic explanation for cell-to-cell variability in apoptosis, measurement of many protein levels (as well as some posttrans- as genetic and epigenetic differences tend to be stable over lational modifications), a difficult task. Alternatively, simulation much longer timescales. The initial correlation between sister suggests that predictability can be achieved by measuring the cells also rules out a significant role for stochasticity in the reac- rates of critical reactions, such as the processing of caspase-8 tions that regulate caspase activation, a conclusion supported substrates. Because this rate depends on the levels of multiple by simulation (Eissing et al., 2005). Transient heritability in the upstream proteins, measuring it is much more informative than timing and probability of death data are most consistent with simply knowing protein levels (Spencer et al., 2009). This conclu- an explanation rooted in natural cell-to-cell variation in the levels sion implies a fundamental limit to our ability to predict cell fate or activities of proteins among genetically identical cells. Sister based on single-cell proteomics. cells are known to inherit similar levels of relatively abundant biomolecules during cytokinesis, but levels then diverge due to Conclusions and Future Prospects random fluctuations in protein synthesis and degradation (Sigal Key goals for a combined model- and experiment-driven anal- et al., 2006). In support of this, experiments with cycloheximide ysis of apoptosis are to understand how multiple cooperating show that the rate of sister cell decorrelation is highly sensitive and competing signals are integrated to effectively execute to the rate of protein synthesis (Spencer et al., 2009). a binary death-survival decision, to determine why some Over the last decade, modeling and experimentation in processes and proteins are important in one cell type and not bacteria, yeast, and more recently in mammalian cells, have in another, and to predict the responses of cells to death ligands provided a mechanistic framework for understanding stochastic and chemotherapy drugs. A review of the literature thus far variation (‘‘noise’’) in rates of transcription and translation (Raj suggests that these goals remain largely unfulfilled. Skeptics and van Oudenaarden, 2008). The number of transcriptional initi- will argue that quantitative analysis can only add details to

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 935 existing conceptual frameworks or that mathematical models will not be significant in another. In the case of TRAIL, for are too theoretical and too dependent on assumptions to be example, conventional molecular approaches have implicated useful (although drawing a pathway diagram may involve just the levels of O-glycosylation enzymes (GALNT3, GALNT14; as many assumptions). A more generous and realistic assess- (Wagner et al., 2007), TRAIL decoy receptors (DcR1, DcR2, ment would be that mechanistic modeling of apoptosis has and osteoprotegerin), c-FLIP, BclxL, and inhibitor of apoptosis had an impact in motivating the collection and analysis of quan- proteins (IAPs) in TRAIL resistance in different cell lines (reviewed titative single-cell data, critically evaluating potential regulatory in Zhang and Fang, 2005). It is likely that all of these explanations mechanisms, and investigating the origins of cell-to-cell vari- are correct to some degree, and the key task therefore becomes ability. understanding the role of context. This is precisely where models Technical Challenges hold great promise, as they are able to quantify and weigh the Addressing the long-term goals of quantitative, model-driven contributions of multiple factors. Such context sensitivity could biology will require major conceptual and technical advances. be implemented by using a model in which the topology and Most computational tools currently in use have been adapted rate constants remain the same for all cell types but protein from other fields, but understanding a biological system is concentrations (initial conditions) are altered to match experi- nothing like fixing a radio. Cells are not well-mixed systems as mentally measured protein levels. encountered in chemistry, nor are they easily understood in Ultimately, we need to understand the regulation of apoptosis terms of fundamental physical laws or obviously subject to the in the context of real human tissues and tumors. Because mech- design principles (such as modularity) encountered in engi- anistic modeling is dependent on quantitative, multiplex data, neered systems. They resemble all of these to some extent, this will not be straightforward, even in model organisms. New but systems biology is currently immersed in the uncharted in vivo caspase activity probes (Edgington et al., 2009) and process of working out which concepts from chemistry, physics, high-resolution intravital microscopy (Condeelis and Weissleder, and engineering are most useful in understanding cells and 2010) will play an important role in data acquisition in vivo, but it tissues. also seems probable that the development of mechanistic It is already evident that different research groups will models able to store, simulate, and rationalize results obtained continue to build models differing in scope and level of detail across a panel of cancer cell lines will be essential. Such and customized to the biological questions being addressed. context-sensitive modeling might uncover a multifactorial Current approaches to model building typically involve de measurement that could be made on real human tumors. novo creation of complex sets of equations in each paper. Expression profiling and cancer genome sequencing also aspire A lack of transparency in the underlying assumptions makes it to personalize cancer therapy, but the framework we envision is difficult for practitioners, nevermind the general research complementary in focusing on biochemical mechanism. A multi- community, to understand how models differ from each other. plex measurement method (BH3 profiling) already exists to Fortunately, ‘‘rules-based’’ modeling methods now in develop- estimate the propensity of cells to undergo apoptosis; it involves ment promise to address the issue of model reusability and intel- permeabilizing cells and then monitoring their responses to ligibility (Faeder et al., 2009; Hlavacek et al., 2006). More diverse BH3-only peptides (Deng et al., 2007). BH3 profiling rigorous means for linking models to experimental data and for can predict sensitivity to conventional chemotherapies and to understanding which aspects of a model are supported by the Bcl-2/BclxL antagonist ABT-737 (Deng et al., 2007). It would data are required. Progress in this area is slow, but the basic be valuable to construct a predictive mathematical framework principles are understood in the context of engineering and for BH3 profiling and thereby generate precise mechanistic the physical sciences (Jaqaman and Danuser, 2006). Finally, understanding of drug sensitivity and resistance that could be we must work to ensure basic familiarity with dynamical systems translated clinically. among trainees. It is widely accepted that a working knowledge Single-cell analysis of cellular responses to FasL and TRAIL of statistical methods such as clustering and regression is has highlighted the dramatic impact of cell-to-cell variability in essential in contemporary biomedicine, but it is unfortunate determining the timing and probability of response. That cells that few students are taught that familiar Michaelis-Menten surviving exposure to a death ligand or cytotoxic drug can equations are simply approximations to a mass-action resume normal proliferation is a testament to the ‘‘stiff trigger,’’ formalism written as networks of ODEs (Chen et al., 2010). ‘‘all-or-nothing’’ nature of the apoptotic switch. Cells that cross Biological Challenges the threshold for MOMP are normally fully committed to die, Cancer pharmacology is the area of translational medicine in whereas cells that remain below it can recover and continue to which models of apoptosis are most obviously of value. Critical proliferate. In the case of receptor-mediated apoptosis, the questions in the development of rational and personalized treat- presence of a dose-dependent variable delay preceding ment of cancer involve understanding precisely how anticancer MOMP followed by a dose-independent and nearly invariant drugs induce apoptosis, why the extent of cell killing varies so post-MOMP period likely reflects the evolutionary advantages dramatically from one tumor to the next, and how we can predict of such a system. Variability in the timing and probability of response to chemotherapy, both ‘‘targeted’’ and cytotoxic. apoptosis makes it possible for a uniform population of cells to As yet no quantitative, model-based studies of these issues respond to a prodeath stimulus in a graded manner, even though have been reported, but it seems almost certain that sensitivity the response is binary at the single-cell level. In contrast, by and resistance will be controlled in a multifactorial manner. undergoing MOMP and effector caspase activation in a rapid Genes and proteins that are important in one cellular setting and invariant way, cells avoid the highly deleterious effects of

936 Cell 144, March 18, 2011 ª2011 Elsevier Inc. initiating but not completing apoptosis; these effects include Betti, C.J., Villalobos, M.J., Jiang, Q., Cline, E., Diaz, M.O., Loredo, G., and formation of ‘‘undead’’ cells with damaged genomes. Vaughan, A.T. (2005). Cleavage of the MLL gene by activators of apoptosis 19 Variability in response appears to be universal across diverse is independent of topoisomerase II activity. Leukemia , 2289–2295. cell lines and proapoptotic stimuli (Cohen et al., 2008; Gascoigne Bhola, P.D., and Simon, S.M. (2009). Determinism and divergence of apoptosis susceptibility in mammalian cells. J. Cell Sci. 122, 4296–4302. and Taylor, 2008; Geva-Zatorsky et al., 2006; Orth et al., 2008; Sharma et al., 2010; Spencer et al., 2009; Huang et al., 2010). Borges, J.L., and Hurley, A. (1999). Collected Fictions (London: Allen Lane The Penguin Press). For example, Gascoigne and Taylor (2008) characterized the response of 15 cell lines to three different classes of antimitotic Calzone, L., Tournier, L., Fourquet, S., Thieffry, D., Zhivotovsky, B., Barillot, E., and Zinovyev, A. (2010). Mathematical modelling of cell-fate decision in drugs and found significant inter- and intra-cell line variation, response to death receptor engagement. PLoS Comput. Biol. 6, e1000702. with cells exhibiting multiple distinct phenotypes in response to Chabner, B., and Longo, D.L. (2006). Cancer Chemotherapy and Biotherapy: the same treatment. Cohen et al. (2008) correlated variability in Principles and Practice, Fourth Edition (Philadelphia: Lippincott Willians & the levels of two proteins with the life-or-death response to the Wilkins). cancer drug camptothecin. Most recently, Sharma et al. (2010) Chen, C., Cui, J., Lu, H., Wang, R., Zhang, S., and Shen, P. (2007). Modeling detected a small subpopulation of reversibly ‘‘drug-tolerant’’ of the role of a Bax-activation switch in the mitochondrial apoptosis decision. cells following treatment with cisplatin or the epidermal growth Biophys. J. 92, 4304–4315. factor receptor inhibitor erlotinib. The significance of these find- Chen, W.W., Niepel, M., and Sorger, P.K. (2010). Classic and contemporary ings is that cancer therapy is beset by the problem of fractional, approaches to modeling biochemical reactions. Genes Dev. 24, 1861–1875. or incomplete, killing of tumor cells. Multiple explanations have Chipuk, J.E., Bouchier-Hayes, L., and Green, D.R. (2006). Mitochondrial outer been proposed for fractional killing, including drug insensitivity membrane permeabilization during apoptosis: the innocent bystander 13 during certain phases of the cell cycle, genetic heterogeneity, scenario. Cell Death Differ. , 1396–1402. incomplete access of tumor to drug (Chabner and Longo, Chipuk, J.E., and Green, D.R. (2008). How do BCL-2 proteins induce mito- 18 2006; Skeel, 2003), and the existence of drug-resistant cancer chondrial outer membrane permeabilization? Trends Cell Biol. , 157–164. stem cells (Reya et al., 2001). Single-cell imaging and computa- Cohen, A.A., Geva-Zatorsky, N., Eden, E., Frenkel-Morgenstern, M., Issaeva, I., Sigal, A., Milo, R., Cohen-Saidon, C., Liron, Y., Kam, Z., et al. (2008). tional modeling of apoptosis have added to this list cell-to-cell Dynamic proteomics of individual cancer cells in response to a drug. Science variability in protein levels arising from stochasticity in protein 322, 1511–1516. expression (Spencer et al., 2009). A critical task for the future Condeelis, J., and Weissleder, R. (2010). In vivo imaging in cancer. Cold Spring will be to ascertain the relative importance of these processes Harb. Perspect. Biol. 2, a003848. in determining the extent of fractional killing with real tumors Deng, J., Carlson, N., Takeyama, K., Dal Cin, P., Shipp, M., and Letai, A. (2007). and therapeutic protocols. Because a wide variety of biochem- BH3 profiling identifies three distinct classes of apoptotic blocks to predict ical processes are involved, all operating on different timescales, response to ABT-737 and conventional chemotherapeutic agents. Cancer developing an appropriate quantitative framework will be a key Cell 12, 171–185. step to better understanding. Dussmann, H., Rehm, M., Concannon, C.G., Anguissola, S., Wurstle, M., Kac- mar, S., Voller, P., Huber, H.J., and Prehn, J.H. (2010). Single-cell quantification of Bax activation and mathematical modelling suggest pore formation ACKNOWLEDGMENTS on minimal mitochondrial Bax accumulation. Cell Death Differ. 17, 278–290. The authors thank J. Albeck, J. Bachman, D. Flusberg, S. Gaudet, T. Letai, and Edgington, L.E., Berger, A.B., Blum, G., Albrow, V.E., Paulick, M.G., Lineberry, C. Lopez for their help and acknowledge NIH grants GM68762 and CA139980 N., and Bogyo, M. (2009). Noninvasive optical imaging of apoptosis by cas- for support. pase-targeted activity-based probes. Nat. Med. 15, 967–973. Eissing, T., Conzelmann, H., Gilles, E.D., Allgower, F., Bullinger, E., and Scheurich, P. (2004). Bistability analyses of a caspase activation model for REFERENCES receptor-induced apoptosis. J. Biol. Chem. 279, 36892–36897. Albeck, J., Macbeath, G., White, F., Sorger, P., Lauffenburger, D., and Gaudet, Eissing, T., Allgower, F., and Bullinger, E. (2005). Robustness properties of S. (2006). Collecting and organizing systematic sets of protein data. Nat. Rev. apoptosis models with respect to parameter variations and intrinsic noise. Mol. Cell Biol. 7, 803–812. Syst. Biol. (Stevenage) 152, 221–228. Albeck, J.G., Burke, J.M., Aldridge, B.B., Zhang, M., Lauffenburger, D.A., and Fadeel, B., Orrenius, S., and Zhivotovsky, B. (1999). Apoptosis in human Sorger, P.K. (2008a). Quantitative analysis of pathways controlling extrinsic disease: a new skin for the old ceremony? Biochem. Biophys. Res. Commun. apoptosis in single cells. Mol. Cell 30, 11–25. 266, 699–717. Albeck, J.G., Burke, J.M., Spencer, S.L., Lauffenburger, D.A., and Sorger, P.K. Faeder, J.R., Blinov, M.L., and Hlavacek, W.S. (2009). Rule-based modeling of (2008b). Modeling a snap-action, variable-delay switch controlling extrinsic biochemical systems with BioNetGen. Methods Mol. Biol. 500, 113–167. cell death. PLoS Biol. 6, 2831–2852. Falschlehner, C., Emmerich, C.H., Gerlach, B., and Walczak, H. (2007). TRAIL Ashall, L., Horton, C.A., Nelson, D.E., Paszek, P., Harper, C.V., Sillitoe, K., signalling: decisions between life and death. Int. J. Biochem. Cell Biol. 39, Ryan, S., Spiller, D.G., Unitt, J.F., Broomhead, D.S., et al. (2009). Pulsatile 1462–1475. stimulation determines timing and specificity of NF-kappaB-dependent tran- Ferrell, J.E., Jr., and Machleder, E.M. (1998). The biochemical basis of an all- scription. Science 324, 242–246. or-none cell fate switch in Xenopus oocytes. Science 280, 895–898. Bagci, E.Z., Vodovotz, Y., Billiar, T.R., Ermentrout, G.B., and Bahar, I. (2006). Fricker, N., Beaudouin, J., Richter, P., Eils, R., Krammer, P.H., and Lavrik, I.N. Bistability in apoptosis: roles of bax, bcl-2, and mitochondrial permeability (2010). Model-based dissection of CD95 signaling dynamics reveals both transition pores. Biophys. J. 90, 1546–1559. a pro- and antiapoptotic role of c-FLIPL. J. Cell Biol. 190, 377–389. Bentele, M., Lavrik, I., Ulrich, M., Stosser, S., Heermann, D.W., Kalthoff, H., Friedman, N., Cai, L., and Xie, X.S. (2006). Linking stochastic dynamics to pop- Krammer, P.H., and Eils, R. (2004). Mathematical modeling reveals threshold ulation distribution: an analytical framework of gene expression. Phys. Rev. mechanism in CD95-induced apoptosis. J. Cell Biol. 166, 839–851. Lett. 97, 168302.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 937 Fuentes-Prior, P., and Salvesen, G.S. (2004). The protein structures that Krishna, S., Banerjee, B., Ramakrishnan, T.V., and Shivashankar, G.V. (2005). shape caspase activity, specificity, activation and inhibition. Biochem. J. Stochastic simulations of the origins and implications of long-tailed distribu- 384, 201–232. tions in gene expression. Proc. Natl. Acad. Sci. USA 102, 4771–4776. Fussenegger, M., Bailey, J.E., and Varner, J. (2000). A mathematical model of Lavrik, I.N., Golks, A., Riess, D., Bentele, M., Eils, R., and Krammer, P.H. caspase function in apoptosis. Nat. Biotechnol. 18, 768–774. (2007). Analysis of CD95 threshold signaling: triggering of CD95 (FAS/ Gascoigne, K.E., and Taylor, S.S. (2008). Cancer cells display profound intra- APO-1) at low concentrations primarily results in survival signaling. J. Biol. and interline variation following prolonged exposure to antimitotic drugs. Chem. 282, 13664–13671. Cancer Cell 14, 111–122. Lee, T.K., Denny, E.M., Sanghvi, J.C., Gaston, J.E., Maynard, N.D., Hughey, Geva-Zatorsky, N., Rosenfeld, N., Itzkovitz, S., Milo, R., Sigal, A., Dekel, E., J.J., and Covert, M.W. (2009). A noisy paracrine signal determines the cellular Yarnitzky, T., Liron, Y., Polak, P., Lahav, G., et al. (2006). Oscillations and vari- NF-kappaB response to lipopolysaccharide. Sci. Signal. 2 , ra65. ability in the p53 system. Mol. Syst. Biol. 2, 2006.0033. Legewie, S., Bluthgen, N., and Herzel, H. (2006). Mathematical modeling iden- Gillespie, D.T. (1977). Exact stochastic simulation of coupled chemical reac- tifies inhibitors of apoptosis as mediators of positive feedback and bistability. tions. J. Phys. Chem. 81, 2340–2361. PLoS Comput. Biol. 2, e120. Goldstein, J.C., Kluck, R.M., and Green, D.R. (2000a). A single cell analysis Letai, A.G. (2008). Diagnosing and exploiting cancer’s addiction to blocks in of apoptosis. Ordering the apoptotic phenotype. Ann. N Y Acad. Sci. 926, apoptosis. Nat. Rev. Cancer 8, 121–132. 132–141. Lovell, J.F., Billen, L.P., Bindner, S., Shamas-Din, A., Fradin, C., Leber, B., and Goldstein, J.C., Waterhouse, N.J., Juin, P., Evan, G.I., and Green, D.R. Andrews, D.W. (2008). Membrane binding by tBid initiates an ordered series of (2000b). The coordinate release of cytochrome c during apoptosis is rapid, events culminating in membrane permeabilization by Bax. Cell 135, 1074– complete and kinetically invariant. Nat. Cell Biol. 2, 156–162. 1084. Goldstein, J.C., Munoz-Pinedo, C., Ricci, J.E., Adams, S.R., Kelekar, A., Schu- Lovric, M.M., and Hawkins, C.J. (2010). TRAIL treatment provokes mutations ler, M., Tsien, R.Y., and Green, D.R. (2005). Cytochrome c is released in a single in surviving cells. Oncogene 29, 5048–5060. 12 step during apoptosis. Cell Death Differ. , 453–462. Luan, D., Zai, M., and Varner, J.D. (2007). Computationally derived points of Gonzalvez, F., and Ashkenazi, A. (2010). New insights into apoptosis signaling fragility of a human cascade are consistent with current therapeutic strategies. by Apo2L/TRAIL. Oncogene 29, 4752–4765. PLoS Comput. Biol. 3, e142. Green, D.R. (2011). Means to an End: Apoptosis and Other Cell Death Mech- Luo, K.Q., Yu, V.C., Pu, Y., and Chang, D.C. (2003). Measuring dynamics of anisms (Cold Spring Harbor, N.Y: Cold Spring Harbor Laboratory Press). caspase-8 activation in a single living HeLa cell during TNFalpha-induced Hellwig, C.T., Kohler, B.F., Lehtivarjo, A.K., Dussmann, H., Courtney, M.J., apoptosis. Biochem. Biophys. Res. Commun. 304, 217–222. Prehn, J.H., and Rehm, M. (2008). Real time analysis of tumor necrosis Maas, C., Verbrugge, I., de Vries, E., Savich, G., van de Kooij, L.W., Tait, S.W., factor-related apoptosis-inducing ligand/cycloheximide-induced caspase and Borst, J. (2010). Smac/DIABLO release from mitochondria and XIAP inhi- activities during apoptosis initiation. J. Biol. Chem. 283, 21676–21685. bition are essential to limit clonogenicity of type I tumor cells after TRAIL Hellwig, C.T., Ludwig-Galezowska, A.H., Concannon, C.G., Litchfield, D.W., receptor stimulation. Cell Death Differ. 17, 1613–1623. Prehn, J.H., and Rehm, M. (2010). Activity of protein kinase CK2 uncouples Madesh, M., Antonsson, B., Srinivasula, S.M., Alnemri, E.S., and Hajnoczky, Bid cleavage from caspase-8 activation. J. Cell Sci. 123, 1401–1406. G. (2002). Rapid kinetics of tBid-induced cytochrome c and Smac/DIABLO Hengartner, M.O. (2000). The biochemistry of apoptosis. Nature 407, 770–776. release and mitochondrial depolarization. J. Biol. Chem. 277, 5651–5659. Hlavacek, W.S., Faeder, J.R., Blinov, M.L., Posner, R.G., Hucka, M., and Nelson, D.E., Ihekwaba, A.E., Elliott, M., Johnson, J.R., Gibney, C.A., Fontana, W. (2006). Rules for modeling signal-transduction systems. Sci. Foreman, B.E., Nelson, G., See, V., Horton, C.A., Spiller, D.G., et al. (2004). STKE 2006, re6. Oscillations in NF-kappaB signaling control the dynamics of gene expression. 306 Ho, K.L., and Harrington, H.A. (2010). Bistability in apoptosis by receptor clus- Science , 704–708. tering. PLoS Comput. Biol. 6, e1000956. Neumann, L., Pforr, C., Beaudouin, J., Pappa, A., Fricker, N., Krammer, P.H., Hoffmann, A., Levchenko, A., Scott, M.L., and Baltimore, D. (2002). The Lavrik, I.N., and Eils, R. (2010). Dynamics within the CD95 death-inducing 6 IkappaB-NF-kappaB signaling module: temporal control and selective gene signaling complex decide life and death of cells. Mol. Syst. Biol. , 352. activation. Science 298, 1241–1245. Niepel, M., Spencer, S.L., and Sorger, P.K. (2009). Non-genetic cell-to-cell Hua, F., Cornejo, M.G., Cardone, M.H, Stokes, C.L., and Lauffenburger, D.A. variability and the consequences for pharmacology. Curr. Opin. Chem. Biol. 13 (2005). Effects of Bcl-2 levels on Fas signaling-induced caspase-3 activation: , 556–561. molecular genetic tests of computational model predictions. J. Immunol. 175, Orth, J.D., Tang, Y., Shi, J., Loy, C.T., Amendt, C., Wilm, C., Zenke, F.T., and 985–995. Mitchison, T.J. (2008). Quantitative live imaging of cancer and normal cells Huang, H.C., Mitchison, T.J., and Shi, J. (2010). Stochastic competition treated with Kinesin-5 inhibitors indicates significant differences in phenotypic between mechanistically independent slippage and death pathways deter- responses and cell fate. Mol. Cancer Ther. 7, 3480–3489. mines cell fate during mitotic arrest. PLoS One 5, e15724. Ozbudak, E.M., Thattai, M., Lim, H.N., Shraiman, B.I., and Van Oudenaarden, Jaqaman, K., and Danuser, G. (2006). Linking data to models: data regression. A. (2004). Multistability in the lactose utilization network of Escherichia coli. Nat. Rev. Mol. Cell Biol. 7, 813–819. Nature 427, 737–740. Jost, P.J., Grabow, S., Gray, D., McKenzie, M.D., Nachbur, U., Huang, D.C., Raj, A., and van Oudenaarden, A. (2008). Nature, nurture, or chance: Bouillet, P., Thomas, H.E., Borner, C., Silke, J., et al. (2009). XIAP discriminates stochastic gene expression and its consequences. Cell 135, 216–226. between type I and type II FAS-induced apoptosis. Nature 460, 1035–1039. Rehm, M., Dussmann, H., Janicke, R.U., Tavare, J.M., Kogel, D., and Prehn, Kaufmann, S.H., and Earnshaw, W.C. (2000). Induction of apoptosis by cancer J.H. (2002). Single-cell fluorescence resonance energy transfer analysis chemotherapy. Exp. Cell Res. 256, 42–49. demonstrates that caspase activation during apoptosis is a rapid process. 277 Kim, K.A., Spencer, S.L., Albeck, J.G., Burke, J.M., Sorger, P.K., Gaudet, S., Role of caspase-3. J. Biol. Chem. , 24506–24514. and Kim do, H. (2010). Systematic calibration of a cell signaling network model. Rehm, M., Dussmann, H., and Prehn, J.H. (2003). Real-time single cell analysis BMC Bioinformatics 11, 202. of Smac/DIABLO release during apoptosis. J. Cell Biol. 162, 1031–1043. Korsmeyer, S.J., Shutter, J.R., Veis, D.J., Merry, D.E., and Oltvai, Z.N. (1993). Rehm, M., Huber, H.J., Dussmann, H., and Prehn, J.H. (2006). Systems anal- Bcl-2/Bax: a rheostat that regulates an anti-oxidant pathway and cell death. ysis of effector caspase activation and its control by X-linked inhibitor of Semin. Cancer Biol. 4, 327–332. apoptosis protein. EMBO J. 25, 4338–4349.

938 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Rehm, M., Huber, H.J., Hellwig, C.T., Anguissola, S., Dussmann, H., and Thornberry, N.A., Rano, T.A., Peterson, E.P., Rasper, D.M., Timkey, T., Garcia- Prehn, J.H. (2009). Dynamics of outer mitochondrial membrane permeabiliza- Calvo, M., Houtzager, V.M., Nordstrom, P.A., Roy, S., Vaillancourt, J.P., et al. tion during apoptosis. Cell Death Differ. 16, 613–623. (1997). A combinatorial approach defines specificities of members of the cas- Reya, T., Morrison, S.J., Clarke, M.F., and Weissman, I.L. (2001). Stem cells, pase family and granzyme B. Functional relationships established for key cancer, and cancer stem cells. Nature 414, 105–111. mediators of apoptosis. J. Biol. Chem. 272, 17907–17911. Scaffidi, C., Fulda, S., Srinivasan, A., Friesen, C., Li, F., Tomaselli, K.J., Deba- Tyas, L., Brophy, V.A., Pope, A., Rivett, A.J., and Tavare, J.M. (2000). Rapid tin, K.M., Krammer, P.H., and Peter, M.E. (1998). Two CD95 (APO-1/Fas) caspase-3 activation during apoptosis revealed using fluorescence-reso- signaling pathways. EMBO J. 17, 1675–1687. nance energy transfer. EMBO Rep. 1, 266–270. Schile, A.J., Garcia-Fernandez, M., and Steller, H. (2008). Regulation of Vaughan, A.T., Betti, C.J., and Villalobos, M.J. (2002). Surviving apoptosis. apoptosis by XIAP ubiquitin-ligase activity. Genes Dev. 22, 2256–2266. Apoptosis 7, 173–177. Schutze, S., Tchikov, V., and Schneider-Brachert, W. (2008). Regulation of TNFR1 and CD95 signalling by receptor compartmentalization. Nat. Rev. Vaughan, A.T., Betti, C.J., Villalobos, M.J., Premkumar, K., Cline, E., Jiang, Q., Mol. Cell Biol. 9, 655–662. and Diaz, M.O. (2005). Surviving apoptosis: a possible mechanism of benzene- 153-154 Sharma, S.V., Lee, D.Y., Li, B., Quinlan, M.P., Takahashi, F., Maheswaran, S., induced leukemia. Chem. Biol. Interact. , 179–185. McDermott, U., Azizian, N., Zou, L., Fischbach, M.A., et al. (2010). A chro- Villalobos, M.J., Betti, C.J., and Vaughan, A.T. (2006). Detection of DNA matin-mediated reversible drug-tolerant state in cancer cell subpopulations. double-strand breaks and chromosome translocations using ligation-medi- 141 Cell , 69–80. ated PCR and inverse PCR. Methods Mol. Biol. 314, 109–121. Sigal, A., Milo, R., Cohen, A., Geva-Zatorsky, N., Klein, Y., Liron, Y., Rosenfeld, N., Danon, T., Perzov, N., and Alon, U. (2006). Variability and memory of protein von Ahsen, O., Renken, C., Perkins, G., Kluck, R.M., Bossy-Wetzel, E., and levels in human cells. Nature 444, 643–646. Newmeyer, D.D. (2000). Preservation of mitochondrial structure and function after Bid- or Bax-mediated cytochrome c release. J. Cell Biol. 150, 1027–1036. Skeel, R.T. (2003). Handbook of Cancer Chemotherapy, Sixth Edition (Phila- delphia: Lippincott Williams & Wilkins). Wagner, K.W., Punnoose, E.A., Januario, T., Lawrence, D.A., Pitti, R.M., Lan- Spencer, S.L., Gaudet, S., Albeck, J.G., Burke, J.M., and Sorger, P.K. (2009). caster, K., Lee, D., von Goetz, M., Yee, S.F., Totpal, K., et al. (2007). Death- Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. receptor O-glycosylation controls tumor-cell sensitivity to the proapoptotic Nature 459, 428–432. ligand Apo2L/TRAIL. Nat. Med. 13, 1070–1077. Stennicke, H.R., Renatus, M., Meldal, M., and Salvesen, G.S. (2000). Internally Wurstle, M.L., Laussmann, M.A., and Rehm, M. (2010). The caspase-8 dimer- quenched fluorescent peptide substrates disclose the subsite preferences of ization/dissociation balance is a highly potent regulator of caspase-8, -3, -6 350 human caspases 1, 3, 6, 7 and 8. Biochem. J. , 563–568. signaling. J. Biol. Chem. 285, 33209–33218. Sun, T., Lin, X., Wei, Y., Xu, Y., and Shen, P. (2010). Evaluating bistability of Bax Zhang, L., and Fang, B. (2005). Mechanisms of resistance to TRAIL-induced activation switch. FEBS Lett. 584, 954–960. apoptosis in cancer. Cancer Gene Ther. 12, 228–237. Tait, S.W., Parsons, M.J., Llambi, F., Bouchier-Hayes, L., Connell, S., Munoz- Pinedo, C., and Green, D.R. (2010). Resistance to caspase-independent cell Zheng, Q., and Ross, J. (1991). Comparison of deterministic and stochastic death requires persistence of intact mitochondria. Dev. Cell 18, 802–813. kinetics for nonlinear systems. J. Chem. Phys. 94, 3644–3648.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 939 Leading Edge Review

Pattern, Growth, and Control

Arthur D. Lander1,* 1Department of Developmental and Cell Biology and Department of Biomedical Engineering, Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697-2300, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.009

Systems biology seeks not only to discover the machinery of life but to understand how such machinery is used for control, i.e., for regulation that achieves or maintains a desired, useful end. This sort of goal-directed, engineering-centered approach also has deep historical roots in developmental biology. Not surprisingly, developmental biology is currently enjoying an inﬂux of ideas and methods from systems biology. This Review highlights current efforts to elucidate design principles underlying the engineering objectives of robustness, precision, and scaling as they relate to the developmental control of growth and pattern formation. Examples from vertebrate and invertebrate development are used to illustrate general lessons, including the value of integral feedback in achieving set-point control; the usefulness of self-organizing behavior; the importance of recognizing and appropriately handling noise; and the absence of ‘‘free lunch.’’ By illuminating such principles, systems biology is helping to create a functional framework within which to make sense of the mechanistic complexity of organismal development.

Introduction ysis. These methods are being applied to biological systems The practice of developmental biology is much like re-reading with the aim of elucidating underlying ‘‘design principles,’’ a good book. Even when the ending is well known, much a goal well suited to the investigation of processes that must can be learned from exploring the plot twists and character achieve desired ends. This Review focuses on current progress development that bring it about. The tendency to see develop- in understanding developmental control and the influence that mental events as being inevitably directed toward fixed, prede- systems biology is having on such work. Studies on a variety termined ends is deeply ingrained in developmental biology, of animal species (Figure 1) are discussed below; however, it a habit that is understandable, given the remarkable abilities of should be noted that systems biology approaches to plant embryos to come out normally after drastic manipulations. morphogenesis are also currently bearing (if the pun may be ‘‘Embryonic regulation’’ has fascinated scientists since the 19th forgiven) considerable fruit (e.g., Jiao and Meyerowitz, 2010; century, when Driesch derived normally patterned sea urchin Sahlin et al., 2011). larvae from single embryo blastomeres. Extending this concept to genetic, as opposed to surgical, manipulation is Waddington’s Complexity, Performance, and Control notion of canalization, the idea that the normal phenotype has A complex system can be defined as any system in which large been selected to be especially insensitive to genetic variation. enough numbers of elements interact in simple ways to produce In the modern developmental biology literature, terms like nonobvious behavior. There are two types of such systems: those robustness and precision are finding increasing use. Robustness that are complex by chance, and those that are complex by is the further generalization of canalization to include insensitivity necessity. The former are often studied by physicists and typically to all kinds of perturbations, environmental and genetic. Preci- involve situations in which orderly properties of matter at one level sion—the magnitude of natural variation in developmental of description emerge out of collective chaos at lower ones. Such outcomes—is a measure of robustness with respect to natural emergent properties are often summarized in terms of physical perturbations (e.g., standing genetic variation, normal environ- laws like the Universal Gas Law or Fick’s Laws of Diffusion. mental fluctuations, and the randomness of biochemical The second type of complex system is encountered by engi- processes). neers, who design systems to meet specific performance objec- The frequency and degree with which embryonic regulation, tives. When engineered systems are dynamic (changing in time), canalization, robustness, and precision are encountered in and the performance goals require control (steering behavior development raises many questions. Is there a common prin- toward desired goals), the numbers and types of interacting ciple underlying all such phenomena? Are there conserved components can quickly reach the point at which system mechanisms? Can we explain how (and why) such processes behavior is sufficiently nonobvious that sophisticated mathe- evolved? These questions have, of course, been around for a matics or computer simulation is required to understand and very long time. What’s new these days is an influx of ideas and predict it. concepts from methodologies outside of traditional biology, There is clearly a strong affinity between this second type including control theory, information theory, and network anal- of complexity—deriving from dynamics and control—and the

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 955 Figure 1. Control Objectives in Morphogen- esis The figure compares some of the experimental systems, discussed in this Review, that are being used to study developmental regulation, canalization, robustness, and precision. The Drosophila wing imaginal disc (A) is an excellent model for both pattern formation (through the action of long-range morphogens, such as Hedgehog, Decaptentaplegic [Dpp], and Wingless) and growth control. The wing disc demonstrates both scaling of pattern to size and scaling of size to pattern. The early Xenopus embryo (B) provides an excellent system for studying pattern formation in the absence of growth, as well as the scaling of pattern to size. Pattern formation has been extensively studied along the anteroposterior axis (C) and the dorsoventral axis (D) of the Drosophila embryo. Anteroposterior patterning is initiated by the transcription factor Bicoid, which acts as a long-range morphogen within the cytoplasm of the syncytial early embryo, controlling a cascade of long-range and self-organizing events that segment the embryo into specific regions (stripes). Dorsoventral patterning utilizes the long-range morphogen Dpp to trigger, among other things, a self-organizing process at the dorsal midline. Self-organization also characterizes the mechanism by which narrow, straight veins are positioned on the Drosophila wing (E) during the pupal stage. The development of pigment stripes in teleost fish, such as the zebrafish (F), provides another opportunity to investigate self-organizing patterns, especially in the context of regeneration, the experimental investigation of which has shed new light on mechanism. Mammalian brain (G) and muscle (H) are good models of organ size control; in the case of muscle, genetic studies have revealed a critical role for feedback from chalones. Feedback regulation of growth has long been known about through studies on regeneration of the mammalian liver (I). More recently, studies in the mouse olfactory epithelium (J) have shed light on mechanisms underlying feedback control of both size and regenerative speed. Analogous mechanisms appeartobeat work in mammalian hematopoiesis (K). Other excellent experimental systems, not shown here, include the early vertebrate spinal cord and hindbrain (pattern formation); vertebrate limb buds (pattern formation, growth control); vertebrate and invertebrate retinas (growth control); and plant shoot apical meristems (pattern formation). Figure 1J courtesy of Kim Gokoffski and Anne Calof. picture developmental biologists have of embryos: dynamic, yet problem with forward engineering is that biologists rarely reliably achieving prespecified ends. Indeed, more than 60 years have a thorough understanding of what contributes to fitness ago, developmental biologists were already suggesting that ‘‘the (except perhaps for unicellular organisms in simple environ- complex engineering performances of technology are a much ments). Moreover, even if the performance objectives that drove more pertinent model of the nature of morphogenesis than are the evolution of an individual biological system were known, the more elementary phenomena dealt with in basic physics there is no guarantee that a forward engineering approach would and chemistry’’ (Weiss, 1950). Yet in those days, conditions come up with the same solution as nature. Ask an engineer to were not right for exploiting the natural affinity between engi- build a bridge, and it may not look like any other bridge. This neering and morphogenesis. As we shall see, there are essen- point is illustrated by the history of Turing patterns in develop- tially two types of engineering: forward engineering, which mental biology. involves knowing a set of performance objectives and building The name Turing pattern derives from Alan Turing’s seminal a system that fulfills them, and reverse engineering, which paper, which also introduced the word morphogen (Turing, involves knowing how a system is built and inferring the perfor- 1952). It describes a solution to the general problem of creating mance objectives that necessitated it being built that way. repeating patterns in space. Through further elaboration of Tu- Throughout most of the 20th century, developmental biologists ring’s work (Gierer and Meinhardt, 1972; Meinhardt and Gierer, were not ready to do either. Now they are increasingly doing 2000), we know that such patterns tend to arise in systems of both. spatially arrayed, equivalent components (e.g., cells) when they produce both ‘‘activating’’ and ‘‘inhibiting’’ signals that Forward Engineering Pattern spread at different rates. Depending upon the details, steady If we understand ‘‘performance objectives’’ in biology as corre- states may be reached in which peaks and troughs of signal sponding to whatever natural selection selects for—i.e., what production occur in repeated patterns of spots or stripes evolutionary biologists call ‘‘fitness’’—then we see that one (Figure 2A). Turing patterns exemplify a class of mechanisms

956 Cell 144, March 18, 2011 ª2011 Elsevier Inc. patterning in the vertebrate limb, the patterning of mammalian and avian ectodermal organs, skin pigmentation patterns, branching morphogenesis in the lung, and hydra head regeneration (reviewed by Kondo and Miura, 2010). Recent work on skin pigmentation patterns in fish (Yamaguchi et al., 2007) has been particularly instructive because it takes advantage of the fact that self-organizing mechanisms are inherently regulative, i.e., they can locally repair themselves. More- over, the precise way in which pigment stripes respond to surgical manipulation in the fish is strongly indicative of a Turing process. As we shall see later, the regulative nature of self-organizing pattern can both help and hinder robust patterning, a fact that may explain why boundary-organized mechanisms are also needed in pattern formation. Work on fish pigmentation patterns also emphasizes the fact that the creation of Turing patterns does not necessarily require secreted, diffusible activators and inhibitors (Kondo and Miura, 2010). The Turing process is a mathematical abstraction that invokes the production and destruction of interacting, moving signals. No restrictions are imposed on the molecular details of the signals, how they move, or how they interact. It is possible that the true prevalence of Turing patterns in development has been underestimated because biologists have been too focused on looking for particular kinds of molecules, rather than general design principles. From this we can see both the strength and Figure 2. Two Modes of Organization in the Control of Pattern weakness of the forward engineering approach in biology: it The performance objectives of patterning systems include both controlling the provides a direct route to design principles, but it cannot tell locations of events relative to each other and controlling them relative to us how those principles are implemented in real biological prespecified landmarks. Turing patterns are one example of self-organizing systems. patterns (A). Repeated patterns form spontaneously and exhibit spacings that depend primarily upon the details of local signal activation, inhibition, and spread, with relatively little influence from events outside the system. In Reverse Engineering Growth contrast, long-range morphogen gradients typify boundary-driven organiza- The classic definition of a reverse engineer is the industrial spy tion (B). They inform cells of their location relative to fixed landmarks. In both cases, morphogens establish a characteristic ‘‘length scale’’ or ‘‘wavelength.’’ who, using only stolen blueprints, figures out what a competitor’s In the first case (A), pattern is a direct reflection of that scale, such that product does. Unlike forward engineering, which progresses elements (spots or stripes) occur once per length scale. In the second case (B), from performance objectives to design, reverse engineering the length scale simply determines how gradually ‘‘positional information’’ decays over space; where pattern elements occur (blue, red, green blocks) starts with design and seeks to learn performance objectives. depends upon how cells interpret the positional information they receive. To do this, the engineer must either use pre-existing knowledge of design principles or use modeling and/or simulation to explore the sorts of behaviors a system is capable of, in the hope of termed ‘‘self-organizing’’ because the location and spacing of recognizing performance that might be useful or desirable. elements emerge out of local interactions, and not through Reverse engineering requires extensive knowledge of a sys- instructions that come from elsewhere. tem’s ‘‘wiring diagram,’’ which is one reason why opportunities Initial hopes that Turing processes would provide a simple to do it were rare in biology until the advent of comprehensive explanation for all the periodic patterns of development—from data-gathering methodologies such as genomics, proteomics, skin markings to seashell patterns to embryonic segmenta- saturation mutagenesis, et cetera. Yet this is only half the reason tion—have not been realized. Particularly with respect to early, why reverse engineering is such a prominent activity in systems high-precision events, such as the specification of embryonic biology. The other is that the goal of reverse engineering—to segments, 30 years of intensive experimental genetics has failed learn performance objectives—fills in just the kind of information to produce simple, diffusible activator/inhibitor pairs for such that traditional molecular genetics cannot: what the components cases. Instead, such work has tended to support the view that of a system are for, as opposed to merely what they do. The pattern is organized by morphogens that form long-range gradi- more massive the biological system, the more important such ents from which cells learn their positions. Such systems are insight is. boundary organized (Figure 2B), meaning that positional infor- Among the processes that systems biologists have reverse mation is encoded in one or more boundaries, with morphogens engineered are metabolism, cell-cycle control, stress responses, passively conveying that information across a field of cells. and bacterial chemotaxis (Alon et al., 1999; Csikasz-Nagy et al., Interest in self-organizing patterns is, however, very much on 2008; Khammash, 2008; Sauro and Kholodenko, 2004). The first the rise today, partly because of recent evidence for the involve- explicit attempts to reverse engineer complex developmental ment of Turing processes in left-right axis specification, skeletal systems date to Odell’s work on the network of signaling and

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 957 gene regulation that establishes segment polarity in the cell stages, a multiplicity of feedback factors, and the specificity Drosophila embryo, and on Notch signaling in insect neurogen- of factors for single lineage stages. esis (Meir et al., 2002; von Dassow et al., 2000). In both cases, This analysis produced several useful results, and one of the it was proposed that system design was influenced by a need most important was negative: A chalone that acts by slowing for robustness to parameter uncertainty and internal noise. the divisions of an intermediate cell in the lineage of a self-renew- This work has been followed by many studies from other groups ing tissue should not be able to have any effect on steady-state exploring ways in which other known mechanisms of pattern tissue size. This suggested that the mechanism of action of formation can also be robust (reviewed by Barkai and Shilo, GDF11, which targets just such a lineage stage, must involve 2009; Eldar et al., 2004; Lander et al., 2009b). more than just suppressing cell divisions. This in turn led to Patterning is only one of two fundamental processes in experiments showing that GDF11 also controls the renewal morphogenesis, the other being growth. As growth is often probability of its target cell, i.e., the probability that the target a consequence of cell proliferation, this Review equates growth cell’s progeny remain of the same type instead of progressing control with control of proliferation, aware, of course, that prolif- to the next lineage stage (Lander et al., 2009a). eration can occur without growth (e.g., in early embryos) or with Once this additional mechanism was taken into account, much delayed growth. That growth is under tight control is sup- calculations showed that not only could GDF11 influence the ported by the precision observed in the sizes of organisms and tissue’s steady state, it could control it with near-perfect robust- their parts. For example, when genetic variability is controlled ness. For instance, the steady state became robust to cell-cycle for, adult mouse brains vary only about 5% in size and cell speeds, initial numbers of stem cells, and rates of cell death. number (Williams, 2000). For bilaterally symmetric organs (such Moreover, this feedback arrangement also creates a mechanism as limbs), left-to-right variance in size is similarly very small for triggering extremely rapid regeneration after injury. However, (Wolpert, 2010). both performance objectives could not be met under the same Such precision is impressive in light of the fact that prolifera- conditions, unless additional feedback (in this case from activin) tion, being an exponential process, compounds its errors. A onto the stem cell was also included. Thus, reverse engineering mere 2% decrease in cell-cycle length will, over 30 cell cycles, suggests that the detailed interaction of lineage, feedback, and cause a >50% increase in the size of a growing population. It regulation of self-renewal found in the olfactory epithelium is unlikely that the necessary cell-cycle precision to achieve constitutes a system for simultaneous robust size control and normal organ and body size control can be achieved without rapid regeneration (Lander et al., 2009b; Lo et al., 2009). some sort of feedback process. Indeed, the idea that negative feedback is involved in organ size control received early support The Value of Integral Feedback from studies of liver regeneration and from in vitro studies Around the same time as the above studies on the olfactory showing that many cell types produce substances that suppress epithelium, two groups independently concluded that feedback their own proliferation (reviewed by Elgjo and Reichelt, 2004). control of cell number in hematopoiesis also occurs primarily Work on such substances, chalones, did not significantly take through the regulation of self-renewal, i.e., through control of off until the late 1990s, when it was found that mice deficient in lineage progression, as opposed to control of cell-cycle speed. the TGF-b family member GDF8 (myostatin) produced an excess In one case, the conclusion was supported by the dynamics of of skeletal muscle. GDF8 is made by muscle and acts on muscle regenerative responses following bone marrow transplantation progenitors, thus fulfilling the requirements of a chalone. Subse- (Marciniak-Czochra et al., 2009). The other (Kirouac et al., quently, GDF11, a close homolog of GDF8, was found to exhibit 2009) derived the result from a combination of model exploration analogous effects in a self-renewing neural tissue, the mouse (a systematic approach to reverse engineering) and model fitting olfactory epithelium (Wu et al., 2003). Other molecules have (using computational algorithms to extract parameter values recently been suggested to act as chalones in a variety of tissues from in vitro and in vivo data). (reviewed by Lander et al., 2009a). The evidence that feedback specifically targets progenitor The basic chalone model—in which chalones slow the prolifer- self-renewal in multiple systems suggests that there is some ation of progenitors by an amount directly related to organ or generically useful feature associated with this mechanism. tissue size—is too generic for reverse engineering. What’s Indeed, inspection shows that it is a straightforward implementa- needed is an actual wiring diagram of how such feedback is im- tion of an engineering strategy known as integral feedback plemented in a real organ. Progress toward this end was made in control. Essentially, integral feedback control describes the studies of the olfactory epithelium, where progenitors pass strategy of feeding back into a system a signal that is propor- through distinct lineage stages and GDF11 acts only at a very tional to the time integral of the difference between the system’s specific stage to influence the behavior of an apparent transit- current behavior and its desired behavior (Figure 3). Integral amplifying cell located between a stem cell and a differentiated control is observed in other biological systems (such as bacterial neuron (Wu et al., 2003). It was subsequently found that activin chemotaxis) (Figure 3A) and appears to be generically essential B, another TGF-b family member, is also expressed in the olfac- whenever feedback must maintain a desired output exactly tory epithelium and also has a negative effect on proliferation but (set-point control); this explains why it can robustly maintain acts uniquely on the stem cell stage, and not the transit-ampli- self-renewing tissues at a predetermined size. In contrast, feed- fying cell. Reverse engineering of this system (Lander et al., back regulation of the rate of progenitor progression through 2009a) entailed mathematically exploring what performance the cell cycle amounts to what engineers call proportional objectives could potentially be met by a multiplicity of progenitor control (feedback regulation of cell death also amounts to

958 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Precisely what negative feedback factors are responsible for such behavior in the brain remains unknown. Factors such as GDF8, GDF11, and activin are present in many locations throughout the nervous system. In the retina, however, loss of gdf11 leads not to a change in tissue size but to marked alterations in the proportions of neuronal cell types produced, with some expanding at the expense of others (Kim et al., 2005). In the neural retina, a single progenitor cell type is thought to give rise to all the differentiated cells, suggesting that GDF11’s effects extend not just to whether progenitor cells renew or differentiate but also to their choice of what cell type to differentiate into.

The Range of Control Notwithstanding their likely importance in regulating tissue growth and cellular composition, secreted negative feedback factors can, at best, be only part of the picture. Notch signaling, for example, also influences cell proliferation and controls the fate choices of progenitors (Artavanis-Tsakonas et al., 1999). Through lateral inhibition, Notch can ensure that precisely one progenitor can arise within a particular region of space, a sort Figure 3. Versatility of Integral Feedback Control of short-range set-point control. At the opposite extreme of Integral feedback is particularly useful for achieving set-point control, in which a system achieves a prespecified steady-state behavior independent of range of action are circulating feedback inhibitors, which parabi- external (and often many internal) perturbations. The essence of integral osis experiments long ago implicated in liver size control (Mool- control is to feed back a signal that reflects the time integral of error (the ten and Bucher, 1967). difference between the actual and desired states of the system). Biological systems often use this type of control to achieve robust, perfect adaptation, Indeed, every use of feedback for control in development has i.e., to return to a zero-activity state even after sustained perturbations. For a characteristic spatial range. For example, secreted polypep- example, in bacterial chemotaxis (A), integral feedback adaptively modulates tide growth factors (e.g., chalones) are thought to act within signaling to maximize sensitivity to changes in chemoattractant levels (Alon epithelial tissues at ranges up to a few hundred microns, due et al., 1999; Yi et al., 2000). Integral feedback in the control of cell growth has been described for two distinct systems. Production of chalones, such as to the depleting effects of receptor-mediated uptake (e.g., GDF11, by differentiated cells in the olfactory epithelium inhibits progenitor Lander et al., 2009a; Shvartsman et al., 2001). How could self-renewal (B), providing a feedback signal that increases (decreases) in time such molecules integrate size information over the much larger as long as the probability of progenitor cell renewal is greater (lesser) than 50% (Lander et al., 2009a). Mechanical compression within the Drosophila wing scale of macroscopic organs? One possibility is that they act disc increases with disc size (C), potentially providing a growth inhibitory signal at an early stage of development, when dimensions are smaller. that increases in time as long as cells are proliferating (Shraiman, 2005). As organ growth proceeds, the control provided by feedback Integral feedback can also be used to make a morphogen gradient scale to fit the territory between its source of production and a distant boundary (D). In this would become more and more locally autonomous. This would case, the morphogen inhibits the production of a molecule that acts at long still allow for an accurate global response to perturbations that range to expand the range (length scale) of the morphogen (Ben-Zvi and affect all locations equally (e.g., genetic variability, changes in Barkai, 2010). In such a scenario, buildup of the expander over time provides body temperature, nutritional status), but not to local disruptions a time-integrated error signal, which only vanishes when the morphogen gradient expands all the way to the distant boundary. (e.g., physical damage to a part of the growing tissue would not elicit compensatory growth elsewhere). This seems a good framework for thinking about the specification of limb size, which is remarkably precise yet created out of the actions of proportional control). Although proportional control can provide parts that exhibit considerable growth autonomy (discussed some compensation for disturbances, it generically does not by Pan, 2007; Wolpert, 2010). Such observations do not imply restore a perturbed system to a set-point. that growth control is achieved without global feedback, but The same integral control mechanism that achieves robust simply that global feedback may occur early (e.g., in the limb maintenance of a steady state in constantly renewing lineages, bud instead of in the limb). This makes an important general as in the olfactory epithelium or in hematopoiesis, can also point about developmental precision: the machinery for control provide for robust final size specification in nonrenewing tissues is needed only at times when relevant perturbations tend to such as the brain. Consistent with this, the pattern of gradual happen. progenitor pool expansion, contraction, and extinction that One possible solution to control growth on many spatial scales occurs in the developing brain closely follows the expected is to combine strategies. This seems to happen in the olfactory consequences of negative feedback control of progenitor self- epithelium because tissue size along the apicobasal dimension renewal (Lander et al., 2009a). Indeed, measurements of progen- of the epithelium (<100 mm in thickness) is highly sensitive to itor self-renewal probabilities in the cerebral cortex show just the mutations that alter GDF11 expression or function, but lateral predicted steady decline that negative feedback control should expansion of the epithelium into surrounding connective tissue produce (Nowakowski et al., 2002). (over many millimeters) is less sensitive (Kawauchi et al., 2009).

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 959 So far, molecular mechanisms controlling planar expansion have 2010; Staley and Irvine, 2010). Second, it appears that Yki can been little explored in this tissue, although there are reasons to be strongly activated by a spatial bias in the occupancy of Fat suspect that regulation of fibroblast growth factor activity is on one side of a cell versus another, and that such bias can be involved (Kawauchi et al., 2005). propagated from cell to cell, in a fashion similar to the way in One tissue that has been the focus of a great deal of experi- which cell polarity is propagated from cell to cell by the planar mental work on epithelial planar expansion is the Drosophila cell polarity pathway (with which the Fat/Hippo pathway shares larval wing imaginal disc. Growth of the wing disc is influenced some components) (Reddy and Irvine, 2008). In addition to by signals on multiple length scales (Edgar, 2006; Martin-Castel- evidence that Hippo signaling can act over ranges of several lanos and Edgar, 2002; Nijhout and Grunert, 2010; Schwank and cells, there is evidence that diffusible cues—including bone Basler, 2010)—cell-to-cell, compartment-wide, disc-wide, and morphogenetic proteins (BMPs) (Rogulja et al., 2008) and factors humoral (hormonal). Recent theoretical and experimental work that activate Jun kinase (Sun and Irvine, 2011)—also act as direct (Aegerter-Wilmsen et al., 2007, 2010; Hufnagel et al., 2007; Nien- inputs into the regulation of Yki activity. Together these data haus et al., 2009) suggests that disc-wide coordination may be suggest that Hippo signaling may integrate and produce both mediated, at least in part, by mechanical feedback (the length short- and long-range signals. scale of which can be very long, depending upon the viscoelastic properties of the tissue). The influence of mechanical effects Scaling: Matching Pattern to Growth (tension, compression) on cell growth is well established for From the earliest days of embryology, it has been clear that the mammalian cells (Mammoto and Ingber, 2009). In the wing remarkable robustness of pattern formation is primarily manifest disc, it has been pointed out that such mechanical effects create at the level of relative, not absolute, pattern (i.e., the locations of an opportunity for disc-wide integral feedback control (Shrai- pattern elements relative to size of the tissue being patterned). man, 2005)(Figure 3C). Thus, the patterns that arise in sea urchin embryos derived The wing disc has also been instrumental in shedding light on from isolated blastomeres, or frogs derived from half-embryos, the role of the Hippo signaling pathway (also known as the Sal- are only normal in proportion to an abnormal size. vador/Warts/Merlin pathway) in growth control. The molecular The need for patterning that automatically scales to tissue or details of the Hippo pathway, although still emerging, have body size arises from the fact that growth control mechanisms been reviewed elsewhere (e.g., Buttitta and Edgar, 2007; Gru- are both robust and adaptive, i.e., given constant genetic and sche et al., 2010b; Halder and Johnson, 2011; Pan, 2007; Reddy environmental conditions, they robustly specify size set-points, and Irvine, 2008) and will not be reiterated, except to say that but those set-points are themselves influenced by other factors a major source of input into the pathway is the cell-surface pro- (e.g., nutrition, temperature, genetics, timing). A good example tocadherin Fat, and the major intracellular target of Hippo of adaptable scaling can be found in the Dpp gradient of the signaling seems to be the growth-stimulating transcriptional co- Drosophila wing disc, which displays a roughly constant activator Yorkie (Yki; the vertebrate homolog of YAP), which absolute shape throughout the period of normal disc growth Hippo signaling inactivates. Precisely what the crucial targets (Hufnagel et al., 2007), yet it strongly scales in response to exper- of Yki are is unknown, but recent studies suggest that, like imental manipulations that increase or decrease disc size (Tele- GDF11, its functions include the regulation of self-renewal, and man and Cohen, 2000). These observations suggest that there is not just the rate at which cells traverse the cell cycle (Halder a scaling set-point, but it varies with developmental stage. and Johnson, 2011). Developmental biologists have long sought to identify mecha- As with Notch signaling, the fact that the Hippo pathway is nisms responsible for automatic scaling. Indeed, much early activated by cell-surface ligands and receptors (the only known enthusiasm for Wolpert’s original model of morphogens as ligand for Fat is Dachsous, also a cell-surface protocadherin) molecules produced at a source and degraded at a distant suggests a one-cell range of action. Indeed, recent work sink (Wolpert, 1969) stemmed from the automatic scaling that suggests that the Fat pathway is central in mediating the well- such an arrangement achieves (because the morphogen profile known in vitro phenomenon of ‘‘contact inhibition of cell growth’’ is a straight line from source to sink; changes in the location of (Zhao et al., 2007), an example of growth control with a spatial the sink shift all threshold locations proportionally). As it scale of the single cell. Recent studies also find the Hippo happens, virtually no known morphogen gradients are made by pathway to be essential for compensatory cell proliferation after this source-sink mechanism, as the requirement that morpho- injury, a form of local regenerative response. Interestingly, gens be sensed by cells usually ensures that they are degraded genetic studies aimed at identifying major determinants of organ throughout their field of action, rather than just at one end. size have, in both vertebrates and invertebrates, consistently Because such gradients do not scale automatically, various implicated the Hippo pathway—far more so than the pathways attempts have been made to find strategies to make them do so. controlled by classical, diffusible growth factors (reviewed by One approach has been to postulate the existence of two Halder and Johnson, 2011; Pan, 2007). This suggests that the morphogens at opposite ends of a field of cells, with the stipula- spatial range of Hippo signaling may, sometimes, be quite large. tion that cells take their positional cues from the ratio of the levels Several mechanisms have emerged for how that might be of the two molecules. This situation, which typically requires achieved. substantial fine-tuning of parameters, was recently analyzed First, Yki activation has been shown, in the Drosophila midgut, for exponentially shaped morphogen gradients (McHale et al., to lead to the production of diffusible growth factors and cyto- 2006) and extended to gradients of more general shape (Ben- kines that stimulate proliferation in neighboring cells (Ren et al., Zvi and Barkai, 2010). Such models essentially replace the sink

960 Cell 144, March 18, 2011 ª2011 Elsevier Inc. effect in Wolpert’s original model with an independent positional expander is assumed to be long-lived, its levels reflect the cue that serves the same purpose. In either case, scaling occurs time-integral of the error between where the morphogen gradient because the behavior of the system everywhere becomes is and where it needs to be. Only when the error is driven essen- coupled to what happens at both of its boundaries. Such tially to zero is a steady state achieved. By uncovering such coupling need not be direct. For example, because the rate of a general engineering strategy in this mechanism, the authors spatial decay of a morphogen diffusing within an epithelium achieve several important ends. First, they gain the ability to depends on the apicobasal dimensions of cells (which influence assert that, even if their detailed, explicit model of amphibian the rate of morphogen leakage through the basement mem- D-V scaling (Ben-Zvi et al., 2008) is inaccurate in its specifics brane), it follows that scaling of a morphogen gradient can also (as some have argued; Francois et al., 2009), it is likely to be be achieved by coupling increases in the planar dimensions of correct in its general outlines. (To quote a phrase popular among an epithelial sheet to increases in the apicobasal dimensions systems biologists, ‘‘all models are wrong; some are useful.’’ of its cells (Lander et al., 2011). Indeed, the apicobasal lengths Box, 1979.) Second, they gain the ability to enumerate other of cells of insect imaginal discs do increase in parallel with the classes of mechanisms that are mathematically equivalent, growth of discs as a whole. even if mechanistically dissimilar. An intriguing class of strategies for scaling was recently iden- For example, they show that gradient expansion can be medi- tified as a result of efforts to reverse engineer dorsoventral (D-V) ated not only by a secondary morphogen (like Admp) but by any axis specification in vertebrate embryos. In early embryos, the substance that regulates transport of a morphogen, including specification of cell fate depends on a nonuniform pattern of a diffusible inhibitor that also protects a morphogen from BMP signaling along the D-V axis. Based on work in amphibians receptor-mediated capture. Interestingly, it was recently found and fish, as well as extrapolated findings from the homologous that the secreted protein Pentagone, which is negatively regu- patterning process in insects, it is believed that BMP signaling lated by the BMP-related morphogen Decapentaplegic (Dpp) occurs in a steep ventral-to-dorsal gradient due to the combined in the Drosophila wing disc, is a potent expander of the Dpp effect of initial expression of BMPs (e.g., Bmp 2, 4, and 7) on the gradient (Vuilleumier et al., 2010), consistent with a role in the ventral side and a process of facilitated ventral-ward transport known ability of the Dpp gradient to scale in response to exper- mediated by the BMP-binding protein chordin (which is imental alterations in disc size (Teleman and Cohen, 2000). It has produced dorsally, in the Spemann organizer). Curiously, an also been found that secreted Frizzled-related proteins, which additional BMP ligand, known as Admp, is also produced on are competitive inhibitors of Wnt-receptor interaction, act as the dorsal side of both fish and frog embryos, and its expression expanders of Wnt gradients (Mii and Taira, 2009) and during early is inhibited by BMP signaling. The fact that a BMP ligand is amphibian embryogenesis are expressed in patterns consistent expressed on the opposite side of the embryo from where with negative regulation by Wnts. Although it remains to be BMP signaling is needed, combined with the fact that the tested whether any of these mechanisms is truly involved in expression of this ligand is sensitive to the signaling pathway the scaling of morphogen systems, the above discussion illus- that is least active at the location where it is expressed, strongly trates how the systematic reverse engineering of a complicated suggested that the performance objectives of the D-V specifica- biological system can lead to the generation of novel, testable tion system involve more than just elaborating a simple hypotheses. morphogen gradient. This insight led to the discovery that Admp is required for The Management of Noise scaling the BMP gradient to the size of the embryo (Reversade A major contribution of systems biology has been to increase and De Robertis, 2005). Such scaling is evident in the behaviors awareness of the roles played by noise in biological systems. of surgically manipulated embryos, as well as in normal embryo- Here, noise is defined as variations that originate in random or to-embryo variation. Subsequently a mathematical model was unpredictable molecular and cellular behaviors. Noise is not developed to explain how such scaling works (Ben-Zvi et al., just microscopic fluctuation that averages out at the macro- 2008). More recently, a general design principle, termed expan- scopic level; it can be both a hindrance and a help to biological sion-repression control, was extracted from this mechanism. function. It can degrade precision, but it can also operate This principle can be invoked whenever graded morphogen switches (Hasty et al., 2000), sustain and synchronize oscilla- signaling inhibits a process that would otherwise lead to runaway tions (Lewis, 2003), amplify signals (Paulsson et al., 2000), or expansion of the morphogen gradient itself (Figure 3D). If the determine stem cell dynamics (Hoffmann et al., 2008). The flow expander has a long range of action, then only when its expres- of noise through a network does not behave like the flow of sion is driven nearly to zero can the morphogen gradient reach substrates through a biochemical pathway. Noise can increase a steady state. This will occur only when the morphogen gradient due to stochastic phenomena placed in series but can also has expanded essentially to the edge of the field capable of decrease due to time integration (temporal filtering) as well as making the expander, i.e., when the morphogen gradient fills feedback and feedforward effects in which correlations in noise the tissue it is patterning. In the case of the amphibian embryo, are exploited to produce destructive interference. Admp plays the role of expander, and facilitated transport The noise that affects pattern formation can be both temporal through the actions of chordin ensures that Admp acts over and spatial. These distinct types of noise arise from the same a long range. processes, but the latter occurs when temporal fluctuations This scaling mechanism is an example of integral feedback are independent from cell to cell. In general, noise has both control (Figure 3D) (Ben-Zvi and Barkai, 2010). Because the a time scale (the time over which one would need to average

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 961 bursting and cell-to-cell cooperation). One of the biggest surprises in recent years has been the realization that the majority of gene expression in both prokaryotes and eukaryotes is subject to large-amplitude, slow-varying, high-burst noise (Raj et al., 2006). When morphogens provide positional information over long distances, one impact of noise is to limit the precision with which position can be specified (Figure 4). The randomness of morphogen diffusion creates temporal noise at every location; this sets an integration time over which responding cells (or nuclei, in the case of the intracellular morphogen gradients of syncytial embryos) must sum up their measurements of morphogen concentration to adequately filter such noise. For the intracellular Bicoid gradient of the early Drosophila embryo, this limitation can be significant because nuclei erase their ‘‘reading’’ of the Bicoid level with every nuclear division (which occurs every 10–20 min). To combat this problem, it has been proposed that spatial averaging occurs, with nuclei influencing the readings made by their neighbors (Gregor et al., 2007). This illustrates the point that both spatial and temporal strategies can effectively manage noise in tissue patterning. For extracellular morphogens, signal integration times are likely much longer than for Bicoid, suggesting that local fluctuations in morphogen concentration may not be particularly important, especially when compared with another source of noise: receptors. Numbers of receptors will vary from cell to cell, and even among cells with identical numbers of receptors, receptor occupancy will vary due to the stochastic nature of binding and unbinding (Lauffenburger and Linderman, 1993). Both gene expression noise and morphogen-receptor binding noise will tend to vary on slow time scales, set in one case by the characteristics of transcriptional and translational bursting and of mRNA and protein turnover, and in the other case by the rate of the turnover of bound receptors. Because the range over Figure 4. Reducing Noise in Pattern Formation which a morphogen gradient spreads also depends upon the In boundary-organized pattern formation, the ability of cells to form organized patterns depends upon the accuracy with which they can measure their rate of morphogen uptake and turnover, it has been argued positions within a morphogen gradient. Under idealized conditions (A), cells that, for morphogen gradients of typical biological length scales that autonomously adopt a new behavior at a particular threshold value of (50–200 mm), receptor binding noise will usually be too slow to morphogen concentration will produce a sharp spatial border. In reality, reading a morphogen gradient is fraught with noise: variability in morphogen remove by simple time averaging (Lander et al., 2009b). level, in gene expression, and in cell size, and the stochastic nature of biochemical processes will cause autonomously acting cells to produce ‘‘salt- Noise and Tradeoffs and-pepper’’ borders (B). Many sources of noise lead to fluctuations on a time If the effects of noise on the precision of patterning are not easily scale too slow for cells to compensate simply by integrating signals over time. In principle, processes that enable cells to collaborate with their neighbors can removed by time averaging, how might they be overcome? also reduce the noisiness in morphogen gradient interpretation, producing Looking at this question from a forward engineering perspective smoother borders. Such collaboration can take many forms. For example, in sheds light not only on how performance drives the evolution of (C), the noisy signal in panel B was used to drive the production of an activator in a Turing process (the activator induces its own longer-range inhibitor), the complex biological regulation but also on the importance of level of which was used as a source of positional information. Note the tradeoffs—the problem that controlling one aspect of perfor- improved border sharpness. Analyses such as this suggest that the combined mance often degrades another (Figure 5). To see this, it is impor- use of different modes of pattern organization (boundary- versus self-orga- tant to recognize that the positional uncertainty created by noise nized) can be useful in achieving robust patterning. in the interpretation of a morphogen gradient depends not only on the character of the noise but also on the length scale (steep- to lower noise, at any given location, by a given fraction) and ness) of the gradient. This is because, in a steeper gradient, the a length scale (the distance over which one would need to distance over which cells can distinguish whether they are at one average to lower noise, at any given time, by a given fraction). location or another is shorter. Such averaging times and distances are not just a function of Thus, one way to make a gradient more noise resistant is to the amplitude of noise but also its structure, i.e., whether fluctu- make it steeper. Yet, from any given starting amplitude, making ations occur independently in time (Poisson or shot noise) or a gradient steeper shortens its range. In principle, increasing the space (no cell-to-cell coordination) or exhibit correlations (e.g., starting amplitude of a morphogen gradient (i.e., producing more

962 Cell 144, March 18, 2011 ª2011 Elsevier Inc. morphogen production rate (e.g., due to environmental or genetic variation) produce large changes in the shape of the gradient (Lander et al., 2009b). The tradeoff between receptor saturation at one end of a morphogen gradient and noise at the other end is constrained by the biochemistry of ligand binding and the number of receptors per cell. Strategies for overcoming these constraints create additional problems. Increasing the number of receptors per cell increases morphogen capture but shortens the gradient. Upregulating morphogen destruction can produce arbitrary robustness to fluctuations in morphogen levels (Eldar et al., 2003), but because it makes gradients shallower far from the morphogen source, it also ends up increasing the effect of noise on precision (Lander et al., 2009b). What forward engineering tells us is that performance goals related to precision, robustness, and pattern size can always be expected to interact witheach other (e.g., Figure 5A). It has recently been argued (Lander et al., 2009b)thatsuchtradeoffsoffera more plausible explanation for the relatively short distances (50– 100 cells) (Wolpert, 1969) over which morphogens act than physical limitations on the speed at which morphogens spread (Crick, 1970). With this in mind, we might try to reverse engineer some of the complex mechanisms observed in morphogen gradients, to determine whether any of them might help with these tradeoffs. For example, it has commonly been observed that extracellular morphogens accumulate in vesicular structures inside responding cells. If they continue to signal from such locations—as Dpp indeed does (Bokel et al., 2006)—it would allow cells to achieve signal integration over times much longer than those dictated by rates of morphogen capture (Aquino and Endres, 2010). This explanation for intracellular morphogen accumulation provides an alternative to the still-controversial hypothesis that endocytosis plays an active role in morphogen transport. Another interesting phenomenon, the ability of Hedgehog (Hh) signaling to reflect the ratio of bound to free receptors, rather than the number of bound receptors (Casali and Struhl, 2004), might also serve as a noise-reduction strategy, as it automatically cancels out the effects of temporal fluctuations in receptor number. Figure 5. Performance Tradeoffs and Morphogen Gradients (A) For simple morphogen gradients formed by diffusion with constant Spatial Control of Noise receptor-mediated uptake, the ability to achieve performance objectives (e.g., robustness to uncertainty in morphogen production rate; positional precision; The fact that the spatial character of noise is ultimately what and patterning range) is constrained by unintended side effects of perfor- degrades precision in a morphogen gradient suggests that we mance-enhancing strategies, such as altering levels of morphogen and should also be looking for noise-reduction strategies that are receptor expression or function. explicitly spatial. For example, in the Bicoid gradient system, (B) These tradeoffs may be analyzed quantitatively, by calculating robustness and precision as a function of distance and gradient range. Sx,v is the sensitivity the fact that the Bicoid target gene Hunchback (Hb) is itself of position to the rate of morphogen production; w is the size of the window of a diffusible molecule allows the effects of fluctuations in Bicoid imprecision due to ligand-binding noise. The filled box shows the ‘‘useful signaling at one nucleus to be averaged over many nuclei (Erd- fraction’’ of a morphogen gradient, where performance constraints on both mann et al., 2009; Gregor et al., 2007; Okabe-Oho et al., 2009). Sx,v and w are met. (C) Parameter space exploration suggests that there is some distance beyond The diffusivity of Hb improves precision by ironing out Bicoid which a simple morphogen gradient cannot simultaneously achieve robust- fluctuations, but it also degrades precision by making the Hb ness to morphogen synthesis rate, and positional precision, at any location. boundary less steep, a tradeoff that leads to the prediction of Useful fractions are plotted as a function of patterning range for various values of gradient length scale. Panels B and C are adapted from Lander et al. a maximal effect at an optimal diffusivity (Erdmann et al., (2009b). 2009). The general strategy of using a morphogen gradient to trigger a secondary process that, because it involves diffusion, morphogen) will extend its range, but this strategy is limited by smoothes out spatial noise can also be seen in extracellular another problem: receptor saturation. It turns out that significant morphogen systems. For example, in the Drosophila wing disc, saturation of receptors near the source of a morphogen gradient anteroposterior positional information is first provided by a Hh dramatically degrades robustness such that small changes in gradient, which acts at short range to induce the longer-range

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 963 Dpp gradient (Zecca et al., 1995). Any short-range imprecision in much better. In contrast, processes that self-organize in space the way cells interpret the Hh gradient will be smoothed out in the counteract the effects of noise because they naturally average Dpp gradient. spatial information. In principle, development can reap the bene- Spatial averaging through diffusion need not involve induction fits of the scalability of long-range gradients and the noise reduc- of new morphogens. Induction of diffusible inhibitors or coregu- tion of self-organizing processes by linking the two together lators can have a similar effect. Indeed, recent work in the (Figure 4C). This works particularly well with Turing processes Drosophila wing disc indicates that signaling by the Wg gradient confined to small domains because under these conditions may induce the production of at least two types of diffusible they are most sensitive to external positional information, such inhibitors (Piddini and Vincent, 2009). as boundary conditions. This suggests that some of the most A particularly powerful strategy for overcoming spatial noise is prevalent uses of Turing processes in development may involve the use of what might be called ‘‘triggered self-organization.’’ In situations in which they aren’t forming fields of spots and stripes. the developmental biology literature, self-organizing patterns are Certainly, the Turing process that sets up left-right patterning in typically invoked as a means to generate repeated structures— vertebrates (creating a single boundary) fits this description e.g., fields of spots or stripes. However, using a long-range (Nakamura et al., 2006). morphogen gradient as input to a process that sets up a Turing It remains to be seen how many developmental events involve pattern (e.g., expression of an activator or inhibitor), it is possible collaborations between long-range morphogens and Turing to trigger the formation of a single transition or peak at a specific processes. However, if we consider other locally self-organizing location in space (Koch and Meinhardt, 1994). Because such phenomena (e.g., cell-to-cell signaling networks established by a self-organizing process is driven by diffusion (of information, Notch-Delta interactions, the planar cell polarity pathway, or if not always molecules), it will tend to average out spatial noise Hippo signaling), it is easy to envision morphogenesis as the over a length scale related to the parameters of the process itself result of a continual back-and-forth between long-range signals (Figure 4C). For some self-organizing processes—such as the and local events. Long-range signals preserve and flexibly patterns that result from Notch-Delta-mediated lateral inhibi- control positional information but are easily degraded by noise, tion—the precision of patterning can, apparently, even be whereas local events are less flexible but remove noise and improved by the addition of spatial and temporal noise (Cohen boost signal, acting much like boosters and repeaters in electri- et al., 2010). cal power and wireless data transmission. The formation of veins during the pupal stage of fly wing devel- This viewpoint helps resolve a seeming paradox emerging out opment illustrates the idea of collaboration between long-range of the study of the Drosophila Bicoid gradient. On the one hand, it morphogens and local self-organization. Initially, the Dpp has been shown that the Bicoid gradient and its immediate inter- gradient in the larval wing disc establishes wing vein primordia pretation by nuclei are remarkably precise relatively early in relatively imprecisely. Later, the initiation within those primordia development (Gregor et al., 2007). On the other hand, there is of a series of events involving short-range activation and long- evidence that severe perturbations of the gradient—including range inhibition or depletion (utilizing Notch, Dpp, and EGF ‘‘flattening’’ it by altering the location of bicoid mRNA in the signaling) (Blair, 2007; Yan et al., 2009) results in the formation embryo (Ochoa-Espinosa et al., 2009) or ‘‘stirring’’ it by unevenly of narrow veins through the centers of those primordia. Evidence heating the embryo (Lucchetta et al., 2008)—alter the positions that a Turing process is involved comes from the analysis of of anteroposterior gene expression domains far less than pre- mutations that broaden the vein primordia. In these cases, what dicted, suggesting that a great deal of precision arises through is observed is not broader veins but extra veins in the same terri- self-organization. Such self-organization can arise through the tory (Biehs et al., 1998); this is the expected result when the elaborate, cross-regulatory gene networks that exist among domain over which a Turing process is triggered becomes large the cascade of genes whose expression is initially triggered by compared to its intrinsic wavelength. Another situation in which Bicoid (reviewed by Papatsenko, 2009). As first suggested a form of self-organization sharpens a domain initially specified over two decades ago (Edgar et al., 1989; Lacalli et al., 1988) broadly by Dpp occurs in the Drosophila embryo, where long- and verified recently (Manu et al., 2009a), gap gene cross-regu- range Dpp transport toward the dorsal side of the embryo latory interactions create dynamic attractor states, the hallmark produces a broad peak of Dpp signaling, which in turn triggers of self-organizing systems. Indeed, such self-organization a process that boosts Dpp signaling at short range but inhibits seems to account for robustness of the Bicoid gradient to alter- it at long range (Umulis et al., 2006; Wang and Ferguson, 2005). ations in Bicoid level, at the same time producing a certain amount of scaling to embryo size (Manu et al., 2009b). The ques- Tradeoffs between Self-Organization and Boundary- tion left unanswered by these studies is why, if the primary role of Organized Control Bicoid is to activate a system that achieves much of its robust- Patterns produced by self-organization are relatively insensitive ness through self-organization, is the Bicoid gradient as precise to external positional cues. This insensitivity is a liability when the as it is? The reverse engineer will always respond to such a ques- goal of patterning is to position new events in relation to the tion by suggesting that there are additional performance objec- locations of earlier events, especially if that relationship needs tives that we are failing to take into account. What those might to be adjustable through feedback. For example, the automatic be remains to be determined. scaling of Turing processes is not easy to achieve (see, e.g., Ish- The notion of morphogen gradients as triggers of self-organi- ihara and Kaneko, 2006; Othmer and Pate, 1980; Umulis et al., zation also squares well with recent studies of Hedgehog gradi- 2008). For such purposes, long-range morphogen gradients do ents in both invertebrates and vertebrates, and of retinoic acid

964 Cell 144, March 18, 2011 ª2011 Elsevier Inc. gradients in the vertebrate hindbrain. In such systems, growing The idea that the slope of a morphogen gradient could be evidence suggests that cell fates are dictated primarily by the useful in the control of proliferation is, in fact, a relatively old history and duration of morphogen exposure rather than simply one, having been suggested by the phenomenon of intercalary the steady-state amount of morphogen (Dessaud et al., 2007, regeneration. This refers to the tendency of some embryonic or 2010; Maves and Kimmel, 2005). In the vertebrate spinal cord, adult structures to respond to surgical manipulations by growing Sonic Hedgehog-induced fate switching depends upon cross- selectively at the locations where cells that had previously been regulatory interactions among transcription factors that, as distant become juxtaposed. The hypothesis is that growth with the gap genes of the Drosophila embryo, may be seen as occurs whenever the slope of a gradient of positional information producing a series of attractor states (Briscoe, 2009; Lek et al., from one cell to another exceeds a threshold. This is, in effect, 2010). Such a system may be described as one that is self-orga- the reverse of scaling of pattern to size; it amounts to the scaling nizing in time, with the morphogen gradient applying a spatial of size to pattern. Interestingly, recent studies show that interca- bias to the process. In the Drosophila wing disc, the Hedgehog lary regeneration in wing discs involves activation of Yki via regu- gradient also uses a temporal mechanism to produce multiple lation of the Hippo pathway (Grusche et al., 2010a; Halder and borders of gene expression (Nahmad and Stathopoulos, 2009). Johnson, 2011; Sun and Irvine, 2011). Clearly, the notion of what long-range morphogens do has Such observations tempt speculation that there might be a come a long way since the early French flag models (Wolpert, single unifying principle—measuring and responding to the slope 1969, 2011). of graded positional information—underlying the role of the Hippo pathway in growth control. Unfortunately, this is almost Matching Growth to Pattern certainly too simplistic. Dpp clearly affects Yki signaling through It makes intuitive sense that changes in tissue size, which can a combination of cell-autonomous effects (that depend upon result from nutritional or environmental variability, should give Dpp level, not gradient slope) and nonautonomous ones (Rogulja rise to compensatory changes in the elaboration of positional et al., 2008; Schwank et al., 2011). And although Dpp’s effects on cues by morphogens. It is less obvious why the reverse should the expression of the Fat ligand Dachsous and its regulator Four- also be true: that quantitative changes in morphogen function jointed can explain observed proliferative effects when cells are cause marked changes in tissue growth. Yet virtually all known forced into contact with neighbors that differ greatly in their levels morphogens are growth regulators, and most are growth of Dpp response, it appears that the shapes of the endogenous promoters. In many systems, growth is dependent upon the gradients of Dachsous and Four-jointed in the wing disc are expression of the very same morphogens that establish pattern. rather Dpp independent (Schwank et al., 2011). Similarly, The best studied examples come from the Drosophila wing disc, whereas early studies suggested that proliferative responses to wherein both Dpp and Wg are important positive growth regula- neighbor-neighbor differences in Wg signaling (which induce tors (Baena-Lopez et al., 2009; Schwank and Basler, 2010). Yki activity) (Zecca and Struhl, 2010) might be the primary means Researchers have long focused on explaining the curious obser- by which the Wg gradient drives growth (Baena-Lopez and Gar- vation that cells in the part of the wing disc patterned by Dpp and cia-Bellido, 2006), later work showed that uniform, moderate Wg proliferate in a more or less uniform pattern, whereas the levels of Wg are a potent stimulus for proliferation, and further morphogens that are essential for driving that proliferation are that the slope of the Wg gradient is too shallow throughout distinctly graded in a central-to-peripheral fashion. most of the wing disc to elicit nonautonomous effects (Baena-Lo- Recent studies suggest that the answer to this puzzle has to pez et al., 2009). Adding to these observations, recent work by do with the way in which morphogens interact with the Fat/Hippo Schwank et al. (2011) indicates that Fat/Hippo signaling can be pathway. Among the most interesting effects are non-cell-auton- graded along the anteroposterior axis of the wing disc even omous: Yki activity and subsequent cell proliferation are highly when there is no apparent morphogen gradient in that direction. induced in wing disc cells that express targets of morphogen Although Schwank et al. suggest that Fat/Hippo and Dpp signals (from either Dpp or Wg) at levels substantially higher or signaling are independent growth-control pathways that act in lower than those of their immediate neighbors (Rogulja et al., different domains of the disc, another explanation is that Hippo 2008; Zecca and Struhl, 2010). In the case of Dpp, it has been signaling integrates positional information that is coming from proposed that this nonautonomous effect arises because a source not yet accounted for in any current models. graded Dpp signaling leads to graded expression of the Fat That source could be mechanical force. As described earlier, ligand Dachsous and its regulator Four-jointed, which in turn there are strong suggestions that tension and compression leads to asymmetry in Fat occupancy across each wing disc within the wing disc epithelium play a role in growth control cell, the magnitude of which serves as an inhibitory input to the (Aegerter-Wilmsen et al., 2007, 2010; Hufnagel et al., 2007). Hippo pathway (Rogulja et al., 2008). Recent stress-birefringence measurements indicate the pres- One consequence of this mechanism is that cells in the ence of a central-to-peripheral compression gradient in wing morphogen field receive signals to proliferate that are dependent discs (Nienhaus et al., 2009). Given the close connection upon the local slope of the morphogen gradient. For an exponen- between upstream components of the Hippo pathway and com- tially declining gradient, which is a good approximation of ponents of cell junctions and the cytoskeleton, it has been spec- the wing disc Dpp gradient, the slope, measured relative to ulated that the Hippo pathway directly receives mechanical morphogen concentration at each point, will be a constant, inputs (Grusche et al., 2010b). Clearly, there is substantial potentially explaining how such a gradient could drive spatially need for experimental clarification of the role of mechanical uniform proliferation. events in Hippo pathway signaling.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 965 On the other hand, that source could come from morphogen Unfortunately, this is a standard not often met in the current liter- gradients themselves, through mechanisms related to their ability ature, even in journals with a strong systems biology focus. to trigger self-organizing processes at specific locations. As has (2) Control Is Not Micromanagement been pointed out, the Wg target gene vestigial (vg), which seems It is easy to think that, in order for a system to be under tight to play a direct role in driving wing disc growth, displays autoac- control, every part of it must be controlled. Yet the strategy of tivation (Zecca and Struhl, 2007, 2010), yet vg also seems to be integral feedback (Figure 3) shows that extremely tight control the target of short-range negative feedback, through an unknown can be achieved merely by feeding back the right kind of signal mechanism (Piddini and Vincent, 2009). Arguments based on at one point in a network into an earlier point. The intervening mathematical modeling posit that such an arrangement creates dynamics needn’t be subject to any special regulation (even a situation in which a steady-state balance between Vg-pro- though it may appear that they are). For example, as long as there moted growth, Vg-promoted vg expression, and Wg-dependent is any sort of feedback control on the renewal probabilities of inhibition of vg expression is only achieved at a fixed tissue size stem cells, such cells can be wildly stochastic in their individual (Zhu, 2011). Because this recent work explored only a limited behaviors (e.g., Chang et al., 2008; Clayton et al., 2007; Gomes number of selected parameters, it remains to be seen whether et al., 2011; Snippert et al., 2010) yet still give us the impression this mechanism truly operates in wing discs. Nevertheless, it that they ‘‘know’’ precisely what they are doing. Failure to recog- represents an elegant solution to the problem of achieving set- nize that tightly controlled systems can include uncontrolled point control of growth over a relatively long length scale. parts may well explain the discomfort that many biologists have It should be noted that the simultaneous existence of mecha- long had with random processes, such as stochastic cell-fate nisms to scale pattern to growth and growth to pattern could switches (Chang et al., 2008), and the idea that mere diffusion create—if such mechanisms were truly independent of each creates morphogen gradients. Recent assertions that diffusion other—futile cycles, with each process continually driving the is ‘‘too messy’’ (i.e., too hard to control) to get the job done (Wol- other. Clearly, these mechanisms cannot be independent, but pert, 2009, 2011) indicate that confusion between micromanage- how they are linked remains unclear. One intriguing possibility is ment and control is very much alive in biology. In fact, because that it involves cell-surface glypicans, such as Dally and Dally- control always comes at a price, systems that achieve it without like, which have been shown to be direct targets of Hippo pathway micromanaging are often better off. signaling (Rodriguez et al., 2008). These molecules, and their (3) Phenotype Is Not Performance mammalian orthologs, have been strongly implicated in both Experimental genetics has become an indispensible tool of organ size control (Filmus and Capurro, 2008; Selleck, 1999; developmental biology because it enables us to infer causal Takeo et al., 2005) and the regulation of patterning by morphogen connections between mechanisms (gene activity) and observa- gradients (e.g., Belenkaya et al., 2004; Franch-Marro et al., 2005; tions (phenotypes). Impressed with the intricate beauty of the Galli et al., 2003; Han et al., 2004, 2005; Kreuger et al., 2004). networks that such approaches construct, it is easy to lose sight of the fact that what evolution selects for is not phenotype per se Lessons and Implications but performance, the ability of phenotype to do something The remarkable regulation, canalization, robustness, and preci- useful. Too often, the phenomena we choose to investigate are sion of embryonic development suggest that developing selected for study based on criteria that may be only obliquely systems devote a considerable amount of cellular machinery related to actual importance to the organism. For example, the to the explicit purpose of control. Through forward and reverse high precision of the Drosophila Bicoid gradient is fascinating engineering, it has become possible to systematically explore to us, but we still don’t know how much it matters to the fly. In some of the control challenges faced by developing embryos the wing disc, numerous studies have focused on explaining and link such challenges to enabling mechanisms. In true how spatially graded morphogens drive spatially uniform growth, systems biology fashion, such work strives to explain the com- when we actually have no evidence that the uniformity of growth plexity of developmental mechanisms in terms of the coordi- is itself particularly important. From a traditional molecular nated functions of entire systems, and not just that of individual biology standpoint, these are not serious problems because parts. The studies highlighted above provide general lessons for investigating these phenomena will likely still lead us to new future research into control of morphogenesis and development. mechanisms. But for the systems biologist, who seeks to use Three of the most salient lessons are described below. notions of design and performance to place mechanisms into (1) Performance Is Always Subject to Tradeoffs context, there is a real need to get a systematic handle on The engineering dictum that ‘‘there’s no free lunch’’ makes the what is phenomenon and what is epiphenomenon. Luckily, this point that control always comes at a cost. Making a system robust is just what the tools of forward and reverse engineering provide. in one way invariably makes it fragile in another (Doyle and Csete, As our knowledge of the mechanistic complexity of develop- 2007). Tightly controlling cell number can hamper regeneration mental regulation grows, we can expect to see such approaches speed; controlling how robust morphogen gradients are to fluctu- playing an ever-greater role in making sense of it all. ations in morphogen levels can affect how sensitive they are to noise; using spatially self-organizing processes to suppress the ACKNOWLEDGMENTS effects of noise can make spatial scaling more challenging. In general, the researcher who proposes that a particular mecha- This work was supported by the NIH (P50-GM076516 and R01-GM067247). nism fulfills a particular control function should consider how other I am grateful to Qing Nie, Anne Calof, and Marcos Nahmad for helpful types of performance are degraded as a result of such control. conversations.

966 Cell 144, March 18, 2011 ª2011 Elsevier Inc. REFERENCES Dessaud, E., Yang, L.L., Hill, K., Cox, B., Ulloa, F., Ribeiro, A., Mynett, A., Novitch, B.G., and Briscoe, J. (2007). Interpretation of the sonic hedgehog Aegerter-Wilmsen, T., Aegerter, C.M., Hafen, E., and Basler, K. (2007). Model morphogen gradient by a temporal adaptation mechanism. Nature 450, for the regulation of size in the wing imaginal disc of Drosophila. Mech. Dev. 717–720. 124 , 318–326. Dessaud, E., Ribes, V., Balaskas, N., Yang, L.L., Pierani, A., Kicheva, A., Aegerter-Wilmsen, T., Smith, A.C., Christen, A.J., Aegerter, C.M., Hafen, E., Novitch, B.G., Briscoe, J., and Sasai, N. (2010). Dynamic assignment and and Basler, K. (2010). Exploring the effects of mechanical feedback on epithe- maintenance of positional identity in the ventral neural tube by the morphogen lial topology. Development 137, 499–506. sonic hedgehog. PLoS Biol. 8, e1000382. Alon, U., Surette, M.G., Barkai, N., and Leibler, S. (1999). Robustness in bacte- Doyle, J., and Csete, M. (2007). Rules of engagement. Nature 446, 860. rial chemotaxis. Nature 397, 168–171. Edgar, B.A. (2006). How flies get their size: genetics meets physiology. Nat. Aquino, G., and Endres, R.G. (2010). Increased accuracy of ligand sensing by Rev. Genet. 7, 907–916. receptor internalization. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 81, Edgar, B.A., Odell, G.M., and Schubiger, G. (1989). A genetic switch, based on 021909. negative regulation, sharpens stripes in Drosophila embryos. Dev. Genet. 10, Artavanis-Tsakonas, S., Rand, M.D., and Lake, R.J. (1999). Notch signaling: 124–142. cell fate control and signal integration in development. Science 284, 770–776. Eldar, A., Rosin, D., Shilo, B.Z., and Barkai, N. (2003). Self-enhanced ligand Baena-Lopez, L.A., and Garcia-Bellido, A. (2006). Control of growth and posi- degradation underlies robustness of morphogen gradients. Dev. Cell 5, tional information by the graded vestigial expression pattern in the wing of 635–646. Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 103, 13734–13739. Eldar, A., Shilo, B.Z., and Barkai, N. (2004). Elucidating mechanisms under- Baena-Lopez, L.A., Franch-Marro, X., and Vincent, J.P. (2009). Wingless lying robustness of morphogen gradients. Curr. Opin. Genet. Dev. 14, promotes proliferative growth in a gradient-independent manner. Sci. Signal. 435–439. 2, ra60. Elgjo, K., and Reichelt, K.L. (2004). Chalones: from aqueous extracts to Barkai, N., and Shilo, B.Z. (2009). Robust generation and decoding of oligopeptides. Cell Cycle 3, 1208–1211. morphogen gradients. Cold Spring Harb. Perspect. Biol. 1, a001990. Erdmann, T., Howard, M., and ten Wolde, P.R. (2009). Role of spatial averaging Belenkaya, T.Y., Han, C., Yan, D., Opoka, R.J., Khodoun, M., Liu, H., and Lin, in the precision of gene expression patterns. Phys. Rev. Lett. 103, 258101. X. (2004). Drosophila Dpp morphogen movement is independent of dynamin- mediated endocytosis but regulated by the glypican members of heparan Filmus, J., and Capurro, M. (2008). The role of glypican-3 in the regulation of 7 sulfate proteoglycans. Cell 119, 231–244. body size and cancer. Cell Cycle , 2787–2790. Ben-Zvi, D., and Barkai, N. (2010). Scaling of morphogen gradients by an Franch-Marro, X., Marchand, O., Piddini, E., Ricardo, S., Alexandre, C., and expansion-repression integral feedback control. Proc. Natl. Acad. Sci. USA Vincent, J.P. (2005). Glypicans shunt the Wingless signal between local signal- 132 107, 6924–6929. ling and further transport. Development , 659–666. Ben-Zvi, D., Shilo, B.Z., Fainsod, A., and Barkai, N. (2008). Scaling of the BMP Francois, P., Vonica, A., Brivanlou, A.H., and Siggia, E.D. (2009). Scaling of 461 activation gradient in Xenopus embryos. Nature 453, 1205–1211. BMP gradients in Xenopus embryos. Nature , E1. Biehs, B., Sturtevant, M.A., and Bier, E. (1998). Boundaries in the Drosophila Galli, A., Roure, A., Zeller, R., and Dono, R. (2003). Glypican 4 modulates FGF wing imaginal disc organize vein-specific genetic programs. Development signalling and regulates dorsoventral forebrain patterning in Xenopus 130 125, 4245–4257. embryos. Development , 4919–4929. Blair, S.S. (2007). Wing vein patterning in Drosophila and the analysis of inter- Gierer, A., and Meinhardt, H. (1972). A theory of biological pattern formation. cellular signaling. Annu. Rev. Cell Dev. Biol. 23, 293–319. Kybernetik 12, 30–39. Bokel, C., Schwabedissen, A., Entchev, E., Renaud, O., and Gonzalez-Gaitan, Gomes, F.L., Zhang, G., Carbonell, F., Correa, J.A., Harris, W.A., Simons, B.D., M. (2006). Sara endosomes and the maintenance of Dpp signaling levels and Cayouette, M. (2011). Reconstruction of rat retinal progenitor cell lineages across mitosis. Science 314, 1135–1139. in vitro reveals a surprising degree of stochasticity in cell fate decisions. 138 Box, G.E.P. (1979). Robustness in the strategy of scientific model building. In Development , 227–235. Robustness in Statistics, R.L. Launer and G.N. Wilkinson, eds. (New York: Gregor, T., Tank, D.W., Wieschaus, E.F., and Bialek, W. (2007). Probing the Academic Press), pp. 201–236. limits to positional information. Cell 130, 153–164. Briscoe, J. (2009). Making a grade: Sonic Hedgehog signalling and the control Grusche, F.A., Degoutin, J.L., Richardson, H.E., and Harvey, K.F. (2010a). of neural cell fate. EMBO J. 28, 457–465. The Salvador/Warts/Hippo pathway controls regenerative tissue growth in 350 Buttitta, L.A., and Edgar, B.A. (2007). How size is controlled: from Hippos to Drosophila melanogaster. Dev. Biol. , 255–266. Yorkies. Nat. Cell Biol. 9, 1225–1227. Grusche, F.A., Richardson, H.E., and Harvey, K.F. (2010b). Upstream regula- 20 Casali, A., and Struhl, G. (2004). Reading the Hedgehog morphogen gradient tion of the hippo size control pathway. Curr. Biol. , R574–R582. by measuring the ratio of bound to unbound Patched protein. Nature 431, Halder, G., and Johnson, R.L. (2011). Hippo signaling: growth control and 76–80. beyond. Development 138, 9–22. Chang, H.H., Hemberg, M., Barahona, M., Ingber, D.E., and Huang, S. (2008). Han, C., Belenkaya, T.Y., Wang, B., and Lin, X. (2004). Drosophila glypicans Transcriptome-wide noise controls lineage choice in mammalian progenitor control the cell-to-cell movement of Hedgehog by a dynamin-independent cells. Nature 453, 544–547. process. Development 131, 601–611. Clayton, E., Doupe, D.P., Klein, A.M., Winton, D.J., Simons, B.D., and Jones, Han, C., Yan, D., Belenkaya, T.Y., and Lin, X. (2005). Drosophila glypicans P.H. (2007). A single type of progenitor cell maintains normal epidermis. Nature Dally and Dally-like shape the extracellular Wingless morphogen gradient in 446, 185–189. the wing disc. Development 132, 667–679. Cohen, M., Georgiou, M., Stevenson, N.L., Miodownik, M., and Baum, B. Hasty, J., Pradines, J., Dolnik, M., and Collins, J.J. (2000). Noise-based (2010). Dynamic filopodia transmit intermittent Delta-Notch signaling to drive switches and amplifiers for gene expression. Proc. Natl. Acad. Sci. USA 97, pattern refinement during lateral inhibition. Dev. Cell 19, 78–89. 2075–2080. 225 Crick, F.H.C. (1970). Diffusion in embryogenesis. Nature , 420–422. Hoffmann, M., Chang, H.H., Huang, S., Ingber, D.E., Loeffler, M., and Galle, J. Csikasz-Nagy, A., Novak, B., and Tyson, J.J. (2008). Reverse engineering (2008). Noise-driven stem cell and progenitor population dynamics. PLoS ONE models of cell cycle regulation. Adv. Exp. Med. Biol. 641, 88–97. 3, e2922.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 967 Hufnagel, L., Teleman, A.A., Rouault, H., Cohen, S.M., and Shraiman, B.I. Manu, Surkova, S., Spirov, A.V., Gursky, V.V., Janssens, H., Kim, A.R., (2007). On the mechanism of wing size determination in fly development. Radulescu, O., Vanario-Alonso, C.E., Sharp, D.H., Samsonova, M., et al. Proc. Natl. Acad. Sci. USA 104, 3835–3840. (2009b). Canalization of gene expression in the Drosophila blastoderm by 7 Ishihara, S., and Kaneko, K. (2006). Turing pattern with proportion preserva- gap gene cross regulation. PLoS Biol. , e1000049. tion. J. Theor. Biol. 238, 683–693. Marciniak-Czochra, A., Stiehl, T., Ho, A.D., Jager, W., and Wagner, W. (2009). Jiao, Y., and Meyerowitz, E.M. (2010). Cell-type specific analysis of Modeling of asymmetric cell division in hematopoietic stem cells–regulation 18 translating RNAs in developing flowers reveals new levels of control. Mol. of self-renewal is essential for efficient repopulation. Stem Cells Dev. , Syst. Biol. 6, 419. 377–385. Kawauchi, S., Shou, J., Santos, R., Hebert, J.M., McConnell, S.K., Mason, I., Martin-Castellanos, C., and Edgar, B.A. (2002). A characterization of the and Calof, A.L. (2005). Fgf8 expression defines a morphogenetic center effects of Dpp signaling on cell growth and proliferation in the Drosophila 129 required for olfactory neurogenesis and nasal cavity development in the wing. Development , 1003–1013. mouse. Development 132, 5211–5223. Maves, L., and Kimmel, C.B. (2005). Dynamic and sequential patterning of the Kawauchi, S., Kim, J., Santos, R., Wu, H.H., Lander, A.D., and Calof, A.L. zebrafish posterior hindbrain by retinoic acid. Dev. Biol. 285, 593–605. (2009). Foxg1 promotes olfactory neurogenesis by antagonizing Gdf11. McHale, P., Rappel, W.J., and Levine, H. (2006). Embryonic pattern scaling Development 136, 1453–1464. achieved by oppositely directed morphogen gradients. Phys. Biol. 3, 107–120. Khammash, M. (2008). Reverse engineering: the architecture of biological Meinhardt, H., and Gierer, A. (2000). Pattern formation by local self-activation networks. Biotechniques 44, 323–329. and lateral inhibition. Bioessays 22, 753–760. Kim, J., Wu, H.H., Lander, A.D., Lyons, K.M., Matzuk, M.M., and Calof, A.L. Meir, E., von Dassow, G., Munro, E., and Odell, G.M. (2002). Robustness, flex- (2005). GDF11 controls the timing of progenitor cell competence in developing ibility, and the role of lateral inhibition in the neurogenic network. Curr. Biol. 12, retina. Science 308, 1927–1930. 778–786. Kirouac, D.C., Madlambayan, G.J., Yu, M., Sykes, E.A., Ito, C., and Zandstra, Mii, Y., and Taira, M. (2009). Secreted Frizzled-related proteins enhance the P.W. (2009). Cell-cell interaction networks regulate blood stem and progenitor diffusion of Wnt ligands and expand their signalling range. Development 5 cell fate. Mol. Syst. Biol. , 293. 136, 4083–4088. Koch, A.J., and Meinhardt, H. (1994). Biological pattern formation: from basic Moolten, F.L., and Bucher, N.L. (1967). Regeneration of rat liver: transfer of 66 mechanisms to complex structures. Rev. Mod. Phys. , 1481–1510. humoral agent by cross circulation. Science 158, 272–274. Kondo, S., and Miura, T. (2010). Reaction-diffusion model as a framework for Nahmad, M., and Stathopoulos, A. (2009). Dynamic interpretation of hedgehog 329 understanding biological pattern formation. Science , 1616–1620. signaling in the Drosophila wing disc. PLoS Biol. 7, e1000202. Kreuger, J., Perez, L., Giraldez, A.J., and Cohen, S.M. (2004). Opposing Nakamura, T., Mine, N., Nakaguchi, E., Mochizuki, A., Yamamoto, M., Yashiro, activities of Dally-like glypican at high and low levels of Wingless morphogen K., Meno, C., and Hamada, H. (2006). Generation of robust left-right asymme- 7 activity. Dev. Cell , 503–512. try in the mouse embryo requires a self-enhancement and lateral-inhibition Lacalli, T.C., Wilkinson, D.A., and Harrison, L.G. (1988). Theoretical aspects of system. Dev. Cell 11, 495–504. 104 stripe formation in relation to Drosophila segmentation. Development , Nienhaus, U., Aegerter-Wilmsen, T., and Aegerter, C.M. (2009). Determination 105–113. of mechanical stress distribution in Drosophila wing discs using photoelastic- Lander, A.D., Gokoffski, K.K., Wan, F.Y., Nie, Q., and Calof, A.L. (2009a). Cell ity. Mech. Dev. 126, 942–949. lineages and the logic of proliferative control. PLoS Biol. 7, e15. Nijhout, H.F., and Grunert, L.W. (2010). The cellular and physiological mecha- Lander, A.D., Lo, W.C., Nie, Q., and Wan, F.Y. (2009b). The measure of nism of wing-body scaling in Manduca sexta. Science 330, 1693–1695. success: Constraints, objectives, and tradeoffs in morphogen-mediated Nowakowski, R.S., Caviness, V.S., Jr., Takahashi, T., and Hayes, N.L. (2002). patterning. Cold Spring Harb. Perspect. Biol. 1, a002022. Population dynamics during cell proliferation and neuronogenesis in the devel- Lander, A.D., Nie, Q., Vargas, B., and Wan, F.Y.M. (2011). Size-normalized oping murine neocortex. Results Probl. Cell Differ. 39, 1–25. robustness of Dpp gradient in Drosophila wing imaginal disc. J. Mech. Ochoa-Espinosa, A., Yu, D., Tsirigos, A., Struffi, P., and Small, S. (2009). Mater. Struct., in press. Anterior-posterior positional information in the absence of a strong Bicoid Lauffenburger, D.A., and Linderman, J.J. (1993). Receptors. Models for gradient. Proc. Natl. Acad. Sci. USA 106, 3823–3828. Binding, Trafficking and Signaling (New York: Oxford University Press). Okabe-Oho, Y., Murakami, H., Oho, S., and Sasai, M. (2009). Stable, precise, Lek, M., Dias, J.M., Marklund, U., Uhde, C.W., Kurdija, S., Lei, Q., Sussel, L., and reproducible patterning of bicoid and hunchback molecules in the early Rubenstein, J.L., Matise, M.P., Arnold, H.H., et al. (2010). A homeodomain Drosophila embryo. PLoS Comput. Biol. 5, e1000486. feedback circuit underlies step-function interpretation of a Shh morphogen Othmer, H.G., and Pate, E. (1980). Scale-invariance in reaction-diffusion gradient during ventral neural patterning. Development 137, 4051–4060. models of spatial pattern formation. Proc. Natl. Acad. Sci. USA 77, 4180–4184. Lewis, J. (2003). Autoinhibition with transcriptional delay: A simple mechanism Pan, D. (2007). Hippo signaling in organ size control. Genes Dev. 21, 886–897. for the zebrafish somitogenesis oscillator. Curr. Biol. 13, 1398–1408. Papatsenko, D. (2009). Stripe formation in the early fly embryo: principles, Lo, W.C., Chou, C.S., Gokoffski, K.K., Wan, F.Y., Lander, A.D., Calof, A.L., and models, and networks. Bioessays 31, 1172–1180. Nie, Q. (2009). Feedback regulation in multistage cell lineages. Math. Biosci. Eng. 6, 59–82. Paulsson, J., Berg, O.G., and Ehrenberg, M. (2000). Stochastic focusing: fluctuation-enhanced sensitivity of intracellular regulation. Proc. Natl. Acad. Sci. Lucchetta, E.M., Vincent, M.E., and Ismagilov, R.F. (2008). A precise Bicoid USA 97, 7148–7153. gradient is nonessential during cycles 11-13 for precise patterning in the Drosophila blastoderm. PLoS ONE 3, e3651. Piddini, E., and Vincent, J.P. (2009). Interpretation of the wingless gradient 136 Mammoto, A., and Ingber, D.E. (2009). Cytoskeletal control of growth and cell requires signaling-induced self-inhibition. Cell , 296–307. fate switching. Curr. Opin. Cell Biol. 21, 864–870. Raj, A., Peskin, C.S., Tranchina, D., Vargas, D.Y., and Tyagi, S. (2006). 4 Manu, Surkova, S., Spirov, A.V., Gursky, V.V., Janssens, H., Kim, A.R., Stochastic mRNA synthesis in mammalian cells. PLoS Biol. , e309. Radulescu, O., Vanario-Alonso, C.E., Sharp, D.H., Samsonova, M., et al. Reddy, B.V., and Irvine, K.D. (2008). The Fat and Warts signaling pathways: (2009a). Canalization of gene expression and domain shifts in the Drosophila new insights into their regulation, mechanism and conservation. blastoderm by dynamical attractors. PLoS Comput. Biol. 5, e1000303. Development 135, 2827–2838.

968 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Ren, F., Wang, B., Yue, T., Yun, E.Y., Ip, Y.T., and Jiang, J. (2010). Hippo von Dassow, G., Meir, E., Munro, E.M., and Odell, G.M. (2000). The segment signaling regulates Drosophila intestine stem cell proliferation through multiple polarity network is a robust developmental module. Nature 406, 188–192. 107 pathways. Proc. Natl. Acad. Sci. USA , 21064–21069. Vuilleumier, R., Springhorn, A., Patterson, L., Koidl, S., Hammerschmidt, M., Reversade, B., and De Robertis, E.M. (2005). Regulation of ADMP and BMP2/ Affolter, M., and Pyrowolakis, G. (2010). Control of Dpp morphogen signalling 4/7 at opposite embryonic poles generates a self-regulating morphogenetic by a secreted feedback regulator. Nat. Cell Biol. 12, 611–617. field. Cell 123, 1147–1160. Wang, Y.-C., and Ferguson, E.L. (2005). Spatial bistability of Dpp-receptor Rodriguez, I., Baena-Lopez, L.A., and Baonza, A. (2008). Upregulation of interactions during Drosophila dorsal-ventral patterning. Nature 434, 229–234. glypicans in Hippo mutants alters the coordinated activity of morphogens. Weiss, P. (1950). An Introduction to Genetic Neurology (Chicago: University of Fly (Austin) 2, 320–322. Chicago Press). Rogulja, D., Rauskolb, C., and Irvine, K.D. (2008). Morphogen control of wing growth through the Fat signaling pathway. Dev. Cell 15, 309–321. Williams, R.W. (2000). Mapping genes that modulate mouse brain development: A quantitative genetic approach. In Mouse Brain Development, A.M. Sahlin, P., Melke, P., and Jonsson, H. (2011). Models of sequestration and Goffient and P. Rakic, eds. (New York: Springer Verlag), pp. 21–49. receptor cross-talk for explaining multiple mutants in plant stem cell regulation. BMC Syst. Biol. 5,2. Wolpert, L. (1969). Positional information and the spatial pattern of cellular 25 Sauro, H.M., and Kholodenko, B.N. (2004). Quantitative analysis of signaling differentiation. J. Theor. Biol. , 1–47. networks. Prog. Biophys. Mol. Biol. 86, 5–43. Wolpert, L. (2009). Diffusible gradients are out - an interview with Lewis Schwank, G., and Basler, K. (2010). Regulation of organ growth by morphogen Wolpert. Interviewed by Richardson, Michael K. Int. J. Dev. Biol. 53, 659–662. gradients. Cold Spring Harb. Perspect. Biol. 2, a001669. Wolpert, L. (2010). Arms and the man: the problem of symmetric growth. PLoS Schwank, G., Tauriello, G., Yagi, R., Kranz, E., Koumoutsakos, P., and Basler, Biol. 8, e1000477. K. (2011). Antagonistic growth regulation by dpp and fat drives uniform cell Wolpert, L. (2011). Positional information and patterning revisited. J. Theor. 20 proliferation. Dev. Cell , 123–130. Biol. 269, 359–365. Selleck, S.B. (1999). Overgrowth syndromes and the regulation of signaling Wu, H.H., Ivkovic, S., Murray, R.C., Jaramillo, S., Lyons, K.M., Johnson, J.E., complexes by proteoglycans. Am. J. Hum. Genet. 64, 372–377. and Calof, A.L. (2003). Autoregulation of neurogenesis by GDF11. Neuron 37, Shraiman, B.I. (2005). Mechanical feedback as a possible regulator of tissue 197–207. growth. Proc. Natl. Acad. Sci. USA 102, 3318–3323. Yamaguchi, M., Yoshimoto, E., and Kondo, S. (2007). Pattern regulation in the Shvartsman, S.Y., Wiley, H.S., Deen, W.M., and Lauffenburger, D.A. (2001). stripe of zebrafish suggests an underlying dynamic and autonomous mecha- Spatial range of autocrine signaling: modeling and computational analysis. nism. Proc. Natl. Acad. Sci. USA 104, 4790–4793. Biophys. J. 81, 1854–1867. Yan, S.J., Zartman, J.J., Zhang, M., Scott, A., Shvartsman, S.Y., and Li, W.X. Snippert, H.J., van der Flier, L.G., Sato, T., van Es, J.H., van den Born, M., (2009). Bistability coordinates activation of the EGFR and DPP pathways in Kroon-Veenboer, C., Barker, N., Klein, A.M., van Rheenen, J., Simons, B.D., Drosophila vein differentiation. Mol. Syst. Biol. 5, 278. et al. (2010). Intestinal crypt homeostasis results from neutral competition between symmetrically dividing Lgr5 stem cells. Cell 143, 134–144. Yi, T.M., Huang, Y., Simon, M.I., and Doyle, J. (2000). Robust perfect adaptation in bacterial chemotaxis through integral feedback control. Proc. Natl. Staley, B.K., and Irvine, K.D. (2010). Warts and Yorkie mediate intestinal regen- Acad. Sci. USA 97, 4649–4653. eration by influencing stem cell proliferation. Curr. Biol. 20, 1580–1587. Sun, G., and Irvine, K.D. (2011). Regulation of Hippo signaling by Jun kinase Zecca, M., and Struhl, G. (2007). Recruitment of cells into the Drosophila wing signaling during compensatory cell proliferation and regeneration, and in primordium by a feed-forward circuit of vestigial autoregulation. Development 134 neoplastic tumors. Dev. Biol. 350, 139–151. , 3001–3010. Takeo, S., Akiyama, T., Firkus, C., Aigaki, T., and Nakato, H. (2005). Expression Zecca, M., and Struhl, G. (2010). A feed-forward circuit linking wingless, fat- of a secreted form of Dally, a Drosophila glypican, induces overgrowth pheno- dachsous signaling, and the warts-hippo pathway to Drosophila wing growth. type by affecting action range of Hedgehog. Dev. Biol. 284, 204–218. PLoS Biol. 8, e1000386. Teleman, A.A., and Cohen, S.M. (2000). Dpp gradient formation in the Zecca, M., Basler, K., and Struhl, G. (1995). Sequential organizing activities of Drosophila wing imaginal disc. Cell 103, 971–980. engrailed, hedgehog and decapentaplegic in the Drosophila wing. 121 Turing, A.M. (1952). The chemical basis of morphogenesis. Philos. Trans. R. Development , 2265–2278. Soc. Lond., B 237, 37–72. Zhao, B., Wei, X., Li, W., Udan, R.S., Yang, Q., Kim, J., Xie, J., Ikenoue, T., Yu, Umulis, D., O’Connor, M.B., and Othmer, H.G. (2008). Robustness of embry- J., Li, L., et al. (2007). Inactivation of YAP oncoprotein by the Hippo pathway is onic spatial patterning in Drosophila melanogaster. Curr. Top. Dev. Biol. 81, involved in cell contact inhibition and tissue growth control. Genes Dev. 21, 65–111. 2747–2761. Umulis, D.M., Serpe, M., O’Connor, M.B., and Othmer, H.G. (2006). Robust, Zhu, H. (2011). Spatiotemporally modulated Vestigial gradient by Wingless bistable patterning of the dorsal surface of the Drosophila embryo. Proc. signaling adaptively regulates cell division for precise wing size control. Natl. Acad. Sci. USA 103, 11613–11618. J. Theor. Biol. 268, 131–140.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 969 Leading Edge Review

Evolution of Gene Regulatory Networks Controlling Body Plan Development

Isabelle S. Peter1,* and Eric H. Davidson1,* 1Division of Biology 156-29, California Institute of Technology, Pasadena, CA 91125, USA *Correspondence: [email protected] (I.S.P.), [email protected] (E.H.D.) DOI 10.1016/j.cell.2011.02.017

Evolutionary change in animal morphology results from alteration of the functional organization of the gene regulatory networks (GRNs) that control development of the body plan. A major mechanism of evolutionary change in GRN structure is alteration of cis-regulatory modules that determine regulatory gene expression. Here we consider the causes and consequences of GRN evolution. Although some GRN subcircuits are of great antiquity, other aspects are highly ﬂexible and thus in any given genome more recent. This mosaic view of the evolution of GRN structure explains major aspects of evolutionary process, such as hierarchical phylogeny and discontinuities of paleontological change.

Introduction and this fundamental phenomenon reflects the underlying In each generation of each animal species, the body plan is sequential hierarchy of the GRN control system. In the earliest formed by the execution of an inherited genomic regulatory embryonic phases, the function of the developmental GRN is program for embryonic development. The basic control task is establishment of specific regulatory states in the spatial domains to determine transcriptional activity throughout embryonic time of the developing organism. In this way the design of the future and space, and here ultimately lies causality in the developmental body plan is mapped out in regional regulatory landscapes, process. The genomic control apparatus for any given develop- which differentially endow the potentialities of the future parts. mental episode consists of the specifically expressed genes Lower down in the hierarchy, GRN apparatus continues regional that encode the transcription factors required to direct the events regulatory specification on finer scales. Ultimately, precisely of that episode, most importantly including the cis-regulatory confined regulatory states determine how the differentiation control regions of these genes. The cis-regulatory sequences and morphogenetic gene batteries at the terminal periphery of combinatorially determine which regulatory inputs will affect the the GRN will be deployed. expression of each gene and what other genes it will affect; that Given that developmental GRN structure determines GRN is, they hard-wire the functional linkages among the regulatory function, and given that derived evolutionary change in animal genes, forming network subcircuits. The subcircuits perform bio- body plans must occur because of change in the genomic logically meaningful jobs, for example, acting as logic gates, apparatus controlling development, evolution of the body plan interpreting signals, stabilizing given regulatory states, or estab- must be effected by alterations in the structure of developmental lishing specific regulatory states in given cell lineages; here the GRNs. A fundamental theme explored in this Review is that most term ‘‘regulatory state’’ means the total of active transcription changes in GRN structure are rooted in cis-regulatory alter- factors in any given cell at any given time. In turn the subcircuits ations, both in principle and in fact. The result of relevant change are ‘‘wired’’ together to constitute the gene regulatory network in GRN structure is derived change in GRN operation, compared (GRN), the genomically encoded developmental control system. to the immediately ancestral GRNs. This will cause changes in developmental process, and ultimately in the product of that process, the body plan (Britten and Davidson, 1971; Davidson Developmental GRN Structure and Erwin, 2006; Erwin and Davidson, 2009). The rules of GRN As with any operational control system, the structure of a devel- structure/function relations are emerging as analysis of develop- opmental GRN determines its functions. GRN structure has mental GRNs accelerates, and this in turn provides pathways a unique character, and in this Review we return repeatedly to into evolutionary mechanism that never could have been the way these structural characteristics affect the processes anticipated in advance. Although a variety of ways of thinking by which evolution of the animal body plan occurs. GRNs are about evolution have been proposed, the evolution of the body inherently hierarchical: the networks controlling each phase of plan is fundamentally a system-level problem to which GRN development are assemblages of subcircuits, the subcircuits structure/function provides the most compelling direct access. are assemblages of specific regulatory linkages among specific genes, and the linkages are individually determined by assemblages of cis-regulatory transcription factor target sites. But at Evolution at cis-Regulatory Nodes the highest level of its organization, the developmental GRN is Because GRN topology is encoded directly in cis-regulatory hierarchical in an additional and, as we discuss below, very sequences at its nodes, evolutionary changes in these important sense. Development progresses from phase to phase, sequences have great potency to alter developmental GRN

970 Cell 144, March 18, 2011 ª2011 Elsevier Inc. structure and function. However, there are many kinds of cis- regulatory sequence changes that produce qualitative gain or regulatory changes that affect function in different ways, ranging loss of target sites can result in the co-option of the respective from loss of function, to quantitative change in function, to network node to a new temporal/spatial expression domain qualitative gain of function resulting in redeployment of gene and thus in the alteration of functional GRN topology. expression. An important implication of Table 1 is that contextual (external) Diverse Consequences of cis-Regulatory Mutations cis-regulatory changes of several kinds may be a major source of Two general kinds of genomic changes affecting cis-regulatory evolutionary GRN redesign. Co-optive redeployment of cis- modules are internal changes affecting sequences within cis- regulatory modules can be due to translocation by mobile regulatory modules and contextual sequence changes that alter elements; spatial repression functions can disappear by deletion the physical disposition of entire cis-regulatory modules. Table 1 of whole modules; cis-regulatory recruitment can be altered by provides a list of both kinds of changes and their possible func- functions that tether them to different promoters. In some tional connotations. Of the internal changes in Table 1, note that branches of evolution, duplication of regulatory genes followed loss or gain of a given target site might cause either loss or gain of by subfunctionalization has been a major source of evolutionary function, depending on whether the factor binding the site is an novelty (Jimenez-Delgado et al., 2009; Ohno, 1970). Although it activator or repressor. is possible to estimate computationally the rate of single target Many internal changes in cis-regulatory sequence will produce site sequence appearance and disappearance, or for specific quantitative effects only, so long as the qualitatively complete set cases observe it, we have virtually no fix on the rates of of interactions is assured by the identity of the target sites. The processes that move cis-regulatory modules into new genomic arrangement, spacing, and number of these sites are relatively contexts. Because cis-regulatory modules may be carried insignificant (Balhoff and Wray, 2005; Cameron and Davidson, around by transposing mobile elements, and because the trans- 2009; Dermitzakis et al., 2003; Liberman and Stathopoulos, position of mobile elements is the most rapid type of large-scale 2009; Ludwig et al., 2000; Walters et al., 2008). In a convincing genomic sequence change in animal genomes, this is likely to be recent example, Hare et al. (2008) showed that >70% of specific a major mechanism of GRN evolution. In human, mouse, and Drosophila melanogaster eve stripe 2 sites are not conserved in Drosophila, estimates suggest insertion rates for certain types some other Drosophilidae, even though these modules produce of mobile elements on the order of 101 per genome generation identical output patterns. They all, however, respond to the same (Garza et al., 1991; Ostertag and Kazazian, 2001), and it is clear qualitative inputs. Furthermore, four different eve cis-regulatory that there have been great bursts of mobile element insertion in modules (three pair-rule stripe modules and a heart expression the evolutionary history of many animal lineages including our module) isolated from flies perhaps 100 million years removed own (e.g., Ohshima et al., 2003; Ostertag and Kazazian, 2001). from their last common ancestor with Drosophila were shown DNA transposons, long-terminal repeat (LTR)-containing retro- to function identically when introduced into D. melanogaster transposons, and non-LTR-containing retrotransposons, both despite extremely different site order, number, and spacing. autonomous and nonautonomous (the latter meaning that enzy- Another example is found in a comparison of orthologous otx matic machinery from another retrotransposon is required for cis-regulatory modules in distantly related ascidians, which mobility), are all capable of altering genomic sequence. Their again revealed extremely different module organization despite various excision, copy, and integration mechanisms lie beyond identical spatial regulatory function (Oda-Ishii et al., 2005). These the scope of this paper (for reviews, see Gogvadze and Buzdin, results indicate great freedom of cis-regulatory design, given 2009; Kazazian, 2004); suffice it to say that the diverse types of only the constraint on input identity and of course the require- rearrangements they cause may directly affect transcriptional ment that all the relevant sites lie within functional interaction processes, positively or negatively. range, in practice usually the several hundred base pairs of the The LTRs of retrotransposons have intrinsic cis-regulatory module sequence. activity and, when transposed into the vicinity of a gene, may There is, however, one notable exception, namely when there cause its transcription (Gogvadze and Buzdin, 2009). In is a high conservation of arrangement of sites found very closely mammals, non-LTR retrotransposons (such as L1 in humans) apposed, presumably because the proteins bound to them have the ability to mobilize nonautonomous mobile elements interact directly with each other or with third parties, for instance (such as Alu repeats in humans), and these frequently carry Dorsal and Twist sites in multiple Drosophila neurogenic with them adjacent sequence elements. Thus Alu repeats have ectoderm genes (Hong et al., 2008), or Otx and Gatae sites in apparently picked up cis-regulatory apparatus during their orthologous cis-regulatory modules of echinoderm otx genes nonautonomous transpositions and moved them to the locations (Hinman and Davidson, 2007). Also, many vertebrate cis-regula- of new genes, and in addition their own sequence may mutate to tory modules are known in which the order of closely packed produce cis-active transcription factor target sites, as shown in target sites is conserved, resulting in high levels of sequence a number of specific examples (for review, Britten, 1997). identity across cis-regulatory modules that have been evolving A very important aspect of this mode of cis-regulatory target separately for 350–450 million years (Elgar and Vavouri, 2008; site insertion has recently been emphasized with the observation Pennacchio et al., 2006; Rastegar et al., 2008; Siepel et al., that, on a genome-wide basis, many such sites are species 2005; Vavouri et al., 2007; Wang et al., 2009). Because of this (or genus or order) specific (e.g., Odom et al., 2007). An excellent exception to the general rule of relaxed cis-regulatory design, case in point is a recent study of sites recognizing the neural in Table 1 site spacing is considered as a possible cause of input repressor REST (Johnson et al., 2009) where it is clear from gain or loss. As Table 1 indicates, only those intramodular cis- comparison among mammalian genomes that primate-specific

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 971 Table 1. Evolutionary Alterations in cis-Regulatory Modules and Their Possible Functional Consequences Effect of Change at Quantitative Input Gain/Loss Gain of Function; Co-optive Sequence Level Loss of Function Output Change within GRN Redeployment to a New GRN Appearance of new XXXX target site(s)a Loss of old target site(s)a XXXX Change in site numbera X Change in site spacinga XXX Change in site arrangementa XXX Translocation of module XXX to new geneb Module deletionb X New tethering functionb XXX Duplication, subfunctionalizationb X GRN, gene regulatory network. a Internal change in cis-regulatory module sequence. b Change affecting genomic context of cis-regulatory. sites have been inserted in recent evolutionary time all over the co-option could alter function downstream of the GRN. In the genome by Alu and L1 transposition, though most of the examples of Figure 1, the regulatory gene newly incorporated primate-specific sites are (as yet) probably functionless. in the GRN might control the deployment of a signal system, or In another case, a non-LTR retrotransposon has inserted an of a differentiation gene battery, but of course it could have auto- and cross-regulatory site into a duplicate copy of the many different effects. dmrt sex control gene in Medaka within the last 10 million years, There is an intrinsically high possibility of evolutionary reorga- which generates a functional species-specific control circuit nization of GRN structure by cis-regulatory gain-of-function determining developmental interplay between these two genes co-options, given the general rapidity of cis-regulatory evolution (Herpin et al., 2010). In summary, as previously speculated and the haplodominance of gain-of-function changes in regula- (Britten and Davidson, 1971), mobile elements could have tory gene expression. Any organism in which such a change provided a major mechanism of GRN evolution. They have the had occurred in either the maternal or paternal germline would, potential to produce exactly the kinds of genomic cis-regulatory if viable, become a clonal founder (Davidson and Erwin, 2010). change that a priori might be the most potent mechanisms for A cis-regulatory gain-of-function event of any of the kinds listed GRN change, that is, gain-of-function co-options of regulatory in Table 1 could have an immediate operational effect on a GRN, gene expression (Table 1). if a newly incident addition to a regulatory state caused addi- Evolution by cis-Regulatory Gain of Function tional GRN subcircuits to be deployed (as in Figure 1B). Or it Evolutionary change in GRN structure may follow directly from could perhaps result in a regulatory gene expression that is for qualitative gain of cis-regulatory linkages among regulatory the moment functionless, though harmless, but which could later and/or signaling genes. If the phenotypic functionality of this become functional when additional co-optive events add to the type of evolutionary process were to require the homozygosity regulatory state other factors with which the first can cooperate of the underlying DNA alteration, as in classic microevolutionary combinatorially, or when additional cis-regulatory changes theory, GRN evolution would be essentially inconceivable. But provide new functional targets. An almost revolutionary revision in fact phenotypic functionality of a co-optive change in regula- emerges from the realization that GRN function can change in tory gene expression will not depend on homozygosity. As initially creative ways by mechanisms that are likely rather than unlikely pointed out by Ruvkun et al. (1991) and further discussed by to occur; that will be dominant and haplosufficient when they do Davidson and Erwin (2010), gain-of-function cis-regulatory occur; and that may be driven by a plethora of diverse processes co-options that produce regulatory gene expression in new at the cis-regulatory DNA level, some of which continuously or domains act dominantly, and this has fundamental conse- stochastically alter genomes with relatively high frequency. quences for evolutionary process. Thousands of routine lab Periods of rapid evolutionary change may be thought of in these experiments in which regulatory systems are systematically terms, but this also raises the obverse question: we now need an redesigned to produce ectopic expression show that for most explanation for the paleontological demonstration of very long regulatory genes, particularly in early development, a single periods of evolutionary stasis in the basic body plans of many copy of the gain-of-function allele produces the regulatory effect. animal lineages. The potency of a cis-regulatory gain-of-function co-option for altering GRN structure is easily imagined, as in the cartoon of The Hierarchical Organization of Developmental GRNs Figure 1. Here we see how the co-option could have occurred Knowing that the basic events causing GRN evolution are cis- by addition of sites to a pre-existing cis-regulatory module, or regulatory alterations, particularly those resulting in qualitative by insertion of a new cis-regulatory module, and then how this additions to or subtractions from the developmental regulatory

972 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 1. Regulatory Gene Co-option and Possible Consequences The diagram shows cis-regulatory mutations that could result in co-optive change in the domain of expression of a regulatory gene and consequences at the level of gene regulatory networks (GRNs). (A) Co-option event: The gene regulatory networks operating in spatial Domains 1 and 2 produce different regulatory states (colored balls, representing diverse transcription factors). A cis-regulatory module of Gene A, a regulatory gene, has target sites for factors present in the Domain 1 regulatory state and so Gene A and its downstream targets are expressed in Domain 1, but not in Domain 2 where only one of the three sites can be occupied. Two alternative types of cis-regulatory mutations are portrayed: appearance of new sites within the module by internal nucleotide sequence change; and transposition into the DNA near the gene of a module from elsewhere in the genome bearing new sites. Although these gain- of-function changes do not affect the occupancy of the cis-regulatory sites of Gene A in Domain 1, the new sites allow Gene A to respond to the regulatory state of Domain 2, resulting in a co- optive change in expression so that Gene A is now active in Domain 2 (modified from Davidson and Erwin, 2010). (B) Gain-of-function changes in Domain 2 GRN architecture caused by co-option of Gene A: Gene A might control expression of an inductive signaling ligand, which could alter the fate/function of adjacent cells now receiving the signal from Domain 2 (left); Gene A might control expression of Gene B, another regulatory gene, and together with it cause expression of a differentiation (D) gene battery, which in consequence of the co-option is now expressed in Domain 2 (right). state, we can sharpen the question we are asking: how do the ences in development between related animals in terms of structural properties of GRNs affect the developmental conse- GRN structure (we consider examples below); second, in prin- quences of such cis-regulatory alterations? ciple it could enable predicted effects to be tested experimen- The Consequences of Hierarchical GRN Structure tally by inserting the cis-regulatory change into a related form As discussed above, the GRNs controlling embryonic develop- expressing the pleisiomorphic GRN, termed ‘‘synthetic exzperi- ment of the body plan are intrinsically hierarchical, essentially mental evolution’’ (Erwin and Davidson, 2009). because of the number of successive spatial regulatory states Another direct evolutionary consequence of GRN hierarchy that must be installed in the course of pattern formation, cell- has also been discussed (Davidson and Erwin, 2006, 2009), type specification, and differentiation. This property of GRNs and this is the phenomenon of canalization. In developmental fundamentally affects the way we need to consider the question terms the establishment of a spatial regulatory state constrains just put. The consequences of any given cis-regulatory mutation subsequent processes: like a decrease in entropy, the number will depend entirely on where in the GRN hierarchy the affected of possible regulatory states downstream is now decreased. cis-regulatory node lies. As Figure 2 shows, changes that occur If the regulatory state defines a progenitor field for a given organ, in the cis-regulatory control apparatus of a given differentiation then all the subsequent stages in the development of that organ gene could cause redeployment of that gene; changes in the must take place within that domain. As in development so in cis-regulatory system determining expression of a controller of evolution, and thus a co-optive mutation leading to qualitative the battery could cause redeployment of the whole battery; evolutionary reorganization at cis-regulatory nodes of an changes upstream of that could affect redeployment of whole upper-level GRN subcircuit is much more likely to entail regulatory states, or of many other features. The circuitry drawn numerous deleterious problems downstream than if the change in Figure 2 is of course arbitrary but its import is general. So in were to occur further down in the hierarchy. Therefore upper order to understand predictively the effect of a given cis-regula- levels of GRN hierarchy are much less likely to change once tory change, the GRN architecture and the position of the muta- a hierarchical GRN has evolved than are more peripheral levels, tion therein must be known. This may seem a demanding and this is the empirical mark of the classical canalization requirement, but from the point of view of understanding evolu- phenomenon. tion mechanistically, it places a powerful lever in our hands. First, Currently, no GRN is analyzed to a degree that we know its it should enable a rational interpretation of evolutionary differ- linkages and functions from its upstream to downstream

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 973 inputs that will be utilized in subsequent derivatives of their terri- tories. The fate patterns they produce are often broadly conserved within clades (the early postembryonic ‘‘phylotype’’). In Box II, progenitor fields for specific body parts (for example, the heart progenitor field or the limb bud) are defined within these early domains. These are sets of cells each expressing the specific GRNs indicated at the level of Box II. The progenitor field then must be subdivided into regions that give rise to the future constituent pieces of the body part, each of which is foreshad- owed by a new GRN (for example, the aorta or ventricle of the heart or the autopod of the limb). Within Box III thus lie the GRNs that control both the identity and the spatial boundaries of these subparts. This patterning GRN thus implements a coordinate system within the progenitor domain that is crucial for morphology and function of the body part. Both patterning GRNs (e.g., Box I and Box III) are oriented along the same axes, and the downstream body-part-specific patterning GRN therefore depends at least indirectly on the upper-level postgastrular patterning GRNs. Depending on the complexity of the Figure 2. Evolutionary Consequences of cis-Regulatory Mutations Functional evolutionary consequences of cis-regulatory mutations depend on body part, multiple rounds of spatial regulatory state subdivision their location in gene regulatory network (GRN) architecture. A GRN circuit and installation of further regional GRNs may be required. Thus, encoding the control system of a differentiation gene battery (bottom tiers) the progression from Box II- and Box III-type GRNs may be activated in response to a signal from adjacent cells (top tier); linkages are in reiterated (backward arrow in Figure 3). Only following these blue, red, and green. The double arrow indicates signal reception and transduction causing gene expression in the recipient cells. Note that the middle tier patterning processes, the terminal cell-fate specification GRNs of circuitry consists of a dynamic feedback stabilization subcircuit. The (Box IV) become activated in spatially restricted domains within cis numbered red ‘‘x’’ symbols denote mutational changes in the -regulatory the body part progenitor field. At the lower periphery of develop- modules controlling expression of these genes, keyed by number to the functional consequences listed in the box below. Loss-of-function mutations mental GRNs are the differentiation gene batteries, that is, the (1 and 2) are indicated in green, and co-optive gain-of-function mutations protein-coding effector genes plus their immediate transcrip- (3 and 4) resulting in expression of the affected gene in a new domain, as in tional regulatory drivers. Figure 1A, are indicated in blue (modified from Erwin and Davidson, 2009). What kinds of subcircuit topologies are found at these different levels of GRN hierarchy? So far, a number of GRNs have been elaborated that indicate the recurrent use of subcircuits in given peripheries, that is, from the beginning of the developmental developmental contexts (Peter and Davidson, 2009). One such process to the terminal differentiated state. We do know, subcircuit, the positive feedback subcircuit, links two or more however, that the GRN output is observable as individual gene regulatory genes by multiple activating regulatory interactions expression patterns and, ultimately, as the developmental and acts to stabilize regulatory states. This is necessary in process. We can use these outputs to infer a framework within body-part-specific GRNs (Box II) or cell-fate GRNs (Box IV), which to position individual regulatory subcircuits or evolutionary given that pattern formation processes usually occur only in changes within the hierarchical GRN. To facilitate the discussion a limited temporal window. Recurrent activating linkages keep on GRN evolution we now define GRN parts according to the the genes expressed even when the initial activating regulatory developmental functions they control and then go on to consider input fades. A positive intercellular feedback subcircuit can abstractly the impact of evolutionary changes occurring in each result in a ‘‘community effect’’ (Bolouri and Davidson, 2010), of these parts. the stabilizing activation of similar regulatory states within a field As shown in Figure 3, we can distinguish four causally con- of cells. Here a gene encoding an intercellular signaling ligand is nected developmental functions that are encoded by sections expressed under the control of the same signal transduction of the GRN represented by Boxes I–IV. The most upstream system it activates. The pattern-forming GRNs of Box I and part of the GRN indicated in Box I controls postgastrular pattern Box III in Figure 3, in contrast, operate largely by means of formation. It is animated by pregastrular spatial and signaling transient signal inputs as well as repressive exclusion functions inputs (maternal anisotropies, maternal factors, early interblasto- that control spatial subdivision. Patterning processes are not mere signals, all used as directional cues, and then by the concerned with stabilization or homogenization of regulatory outputs of the initial zygotic GRNs). The functions of the GRNs states, and they contain few positive feedback loops. The set up in this phase of development, including their signaling biological function of individual subcircuit topologies predicts interactions, are to establish broad domains that section the the probability of its occurrence at specific positions within the organism with respect to the major body axes. The immediate GRN hierarchy. output of the GRNs of Box I is to set upregulatory state domains If one had to predict the GRN parts most likely modified in the within spatially defined areas of the organism. These domains, evolution of body plans, a place to begin would be to define where such as the neuraxis or mesodermal layers, constrain the posi- in the developmental process and therefore in the GRN hierarchy tion of future body parts and also now provide initial regulatory differences occur. Morphological differences between species of

974 Cell 144, March 18, 2011 ª2011 Elsevier Inc. different phyla affect the basic body plan, the overall organization aspects of the regulatory system are revealed. In the examples of the organism. During development, the body plan is estab- that follow, in which single genes are responsible for the changes lished mainly by the upstream embryonic patterning mechanisms observed, it has furthermore been possible to obtain experi- and the individual body-part specification programs that they mental evidence for the evolutionary mechanism underlying the activate in given positions. Phylum-level morphological differ- phenotypic variation in form. ences are therefore expected to occur in the GRNs underlying Genomic Basis of Rapid Evolutionary Trait Loss Boxes I and II. Among classes within the same phylum, the posi- A canonical example, recently elaborated at the sequence level, tion with respect to the body axes or the internal structures of indi- and causally confirmed by experiment, is reduction of pelvic vidual body parts may differ. Differences in the positions of body spines in stickleback fish. Following the end of the last Ice parts relative to each other could occur even when embryonic Age, marine stickleback fish were marooned in multiple lakes patterning GRNs and body-part specification GRNs are formed as the glaciers melted, and during the last 10,000– conserved, simply by rewiring the connections between these 20,000 years independent populations of two different genera functions (such as the linkages connecting Box I and Box II; see of these fish have repeatedly lost external pelvic spines. The also the discussion of hox gene functions below). This could exact selective advantages of pelvic reduction and spine loss result in alterations in the positions of given body parts. Morpho- are not defined, but as it has happened many times indepen- logical differences within body parts are more likely to be caused dently, there clearly are some (Shapiro et al., 2006 and refer- by differences in the spatial assignment of cell-fate domains ences therein). Genetic complementation tests show that determined by the body-part patterning GRNs of Box III. Based diverse isolates bear the same or overlapping genetic lesions, on these arguments one would expect that mutations in regula- and this is so even in crosses of species from different genera tory linkages within the patterning functions are more likely to displaying the same spine reduction phenotype. The underlying be the cause of morphological changes, whereas specification genomic event turns out to be deletion of a cis-regulatory module GRNs active within given cell types or body-part progenitor fields that controls expression of the pitx1 regulatory gene in the pelvic are more likely to be conserved. buds during larval development (Chan et al., 2010). Most signif- Given the predicted prevalence of specific network topologies icantly, when this cis-regulatory module was cloned upstream of for given biological functions, there might be a direct correlation a sequence encoding the Pitx1 protein and introduced into between regional network topology and rate of evolutionary reduced spine fish, it rescued the spineless phenotype. The change. Regulatory linkages used for patterning embryos or cis-regulatory module lies in an unstable, repetitive sequence- body parts frequently rely on inductive signals that connect filled genomic region, possibly accounting for its repeated GRNs underlying specification in different domains and ensure deletion (Chan et al., 2010). The pitx1 gene is clearly involved orchestrated progression of development. In organisms of in pattern formation functions upstream of pelvic girdle specifi- different spatial geometry, inductive signaling relationships will cation, and in spineless fish there is no pitx1 expression in the differ, and thus, inductive signaling interactions are likely to pelvic buds even though the coding region of the gene is intact show a higher rate of evolutionary change. Indeed they do, as (Cole et al., 2003; Shapiro et al., 2006). In amniotes pitx1 discussed elsewhere (Davidson and Erwin, 2006; Erwin and operates in the patterning system that organizes the subparts Davidson, 2009). The high level of conservation of positive feed- of the appendages developing from the hindlimb buds, and back subcircuits has been previously proposed in the Kernel forced expression in forelimb buds transforms them into theory of Davidson and Erwin (2006). These Kernels consist of hindlimbs (Logan and Tabin, 1999; Szeto et al., 1999). Thus a few regulatory genes linked by recursive positive regulatory this gene operates upstream in a portion of the GRN, the function interactions, and they are usually used upstream in GRNs that of which is to generate the spatial regulatory states that presage control the specification of progenitor fields for particular body the parts of the amniote hindlimb, and also of the pelvis, which is parts, and they are conserved at large evolutionary distances. rudimentary in mice deficient in pitx1 (Szeto et al., 1999). Though In summary, evolution of GRNs to produce new develop- pitx1 could execute more downstream roles in pelvic skeletal mental outcomes must involve new subcircuit deployments. formation as well, its expression prior to the terminal phases of This places a premium on co-optive change at the switches, pelvic skeletogenesis indicates that it also functions in a Box III signals, and inter-subcircuit inputs that encode subcircuit body-part-specific patterning GRN in stickleback fish. deployment. Evolution of new developmental GRN features However, rapidly evolving, reduced, or regressive phenotypes must thus proceed to some extent as a process in which diverse can be due to gain-of-function as well as loss-of-function muta- subcircuits are combined, recombined, activated, and inacti- tions. The Mexican cave fish Astyanax exists both in riverine vated in given spatial domains of the embryo. surface waters and in various cave populations that became isolated about 10,000 years ago, and the regressively evolved Evolution by Regulatory Changes in Single Genes traits of the cave populations have been studied for over a half Though the jobs of development require the outputs of multigene century. A recurrent change in cave Astyanax is degeneration subcircuits of given topologies, we see from the above that there of eyes during larval development. During embryogenesis of are points of ‘‘flexibility’’ in developmental GRNs, where co- cavefish, the eyes initially develop similarly to those of surface optive gain-of-function, or loss-of-function, regulatory changes conspecifics, including expression of many regulatory genes may have large effects. By focusing on naturally occurring varia- (Jeffery, 2005, 2009). But then many things go wrong in eye tions between closely related animals where visible evolutionary development including apoptotic degeneration of lens and change has occurred recently, the most evolutionarily flexible retina. A cause is ectopic spatial expression of sonic hedgehog

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 975 (shh) from the normal medial interocular region across the top of evolution in the shavenbaby (ovo) regulatory gene, which the ocular fields in cave fish. As shown experimentally by intro- controls the differentiation and morphogenesis of trichomes duction of shh mRNA in surface Astyanax, excess Shh causes (short hair-like surface appendages), determines where this expression of transcription repressors (vax1 and pax2a), which gene is expressed, and thereby the minute pattern differences interfere with pax6 expression and thus the downstream pax6 in trichome distribution distinguishing Drosophila species ocular patterning subcircuit (Jeffery, 2009; Yamamoto et al., (McGregor et al., 2007). These studies afford multiple real exam- 2004; Baumer et al., 2002). Also, excess Shh indirectly promotes ples of cis-regulatory site addition, and quantitative as well as apoptosis in lens and retina. Though yet undefined at the qualitative cis-regulatory gain and loss of function due to internal sequence level, in cave Astyanax, regulatory changes have DNA sequence change (see Table 1). They provide general and evidently caused a spatial gain of function in shh transcription re- specific indication of the flexibility and changeability of cis-regu- sulting in regression of the eyes. latory modules in local evolution, at the level of function and The simplest cases of evolutionary trait loss are deleterious deployment of differentiation gene batteries, the lowest level in mutations in far downstream differentiation genes. Pigmentation the hierarchy of Figure 3. is among the regressive traits in cave Astyanax. Two pigmenta- Mechanistic studies of intra- and interspecific evolutionary tion phenotypes have been shown to be due to mutations in variation illuminate the next level up as well, that is, evolutionary the protein-coding sequences of receptors directly involved in changes (other than simple loss of function) in the Box III-type pigmentation, oca2 (Protas et al., 2006) and mc1r (Gross et al., pattern formation GRNs that determine the morphological char- 2009). However, in stickleback fishes where there is also loss acteristics of given body parts. The results have thus far often of pigmentation in lacustrine forms, cis-regulatory changes resolved into demonstration of alterations in the deployment of rather than coding region mutations are responsible (Miller signal systems in the development of these parts; that is, the et al., 2007). Here the gene responsible encodes kit ligand underlying evolutionary change is in the cis-regulatory apparatus (Steele factor) and this gene has pleiotropic effects, so that total controlling time and place of inductive signaling, just as pre- loss of function would be severely deleterious. Loss of function in dicted earlier. The causal developmental mechanism underlying a single cis-regulatory module, on the other hand, has specific the adaptively diverse beak morphologies of Darwin’s classic effects that under certain conditions are adaptive. Because series of Galapagos finch species was solved in these terms this is a general feature of cis-regulatory versus coding sequence by Abzhanov et al. (2006, 2004). Species with heavy beaks dis- mutations, it predicts that evolutionary changes in any pleiotropi- played earlier and higher expression of bone morphogenetic cally active gene, as are most regulatory genes, will generally protein 4 (BMP4) in pre-beak neural crest mesenchyme, and target specific cis-regulatory modules (as discussed, for species with elongated, pointed beaks expressed Ca2+/calmod- example, by Chan et al., 2010; Miller et al., 2007; Prud’homme ulin at higher levels, indicating that beak length depends on et al., 2006). Inverting this argument, we see a powerful evolu- extent of Ca2+ signaling. Remarkably, experimental overexpres- tionary explanation for the modularity generally typical of the sion of BMP4 by retroviral gene transfer into developing fronto- cis-regulatory systems controlling expression of regulatory and nasal tissues of chicken embryos produces robust beaks, and signaling genes in animal genomes (Davidson, 2001, 2006). experimental overexpression of the downstream mediator of GRN evolution by regulatory gain and loss of function of expres- Ca2+ signaling, CaMKII, produced elongated beaks, confirming sion of these genes would be utterly impossible were these the causality. To take another example, a recent study shows control systems not in general modular, given that almost all that short legs in dog breeds such as dachshunds and basset such genes function in multiple time-space compartments, and hounds is due to a retrogene encoding fibroblast growth factor in multiple GRNs during development. Physical and functional 4 (FGF4), inserted and evidently controlled by cis-regulatory modularity in the control systems of regulatory genes is thus elements carried in non-LTR transposons (Parker et al., 2009). among the fundamental characteristics of animal genomes that Changes in upstream patterning apparatus can account for permit and, indeed, that produce evolution of development by differences in body plan at inter-ordinal to inter-class levels, GRN reorganization. and such changes are not found in comparing organisms that Morphological Variation due to Single-Gene Regulatory diverged only a few million or a few thousand years ago or Changes less. For example, one of the characters distinguishing bats Whereas the foregoing concerns rapidly occurring evolutionary and rodents, which are of different mammalian orders and in changes in single-gene functions that are of adaptive signifi- fact belong to different super-orders, is the much longer relative cance, we now face a conundrum. How do we extrapolate length of the forearm skeleton in bats. A candidate regulatory from recent evolutionary events to the much more ancient gene known to affect limb skeletal elongation is prx1 (mhox), processes by which order- and class-level differences in body and in bats this gene is upregulated after the early limb bud stage plan arose, let alone phylum-level differences? compared to mice (Cretekos et al., 2008). The (indirect) causality Recent studies focusing on the adaptive evolution of external of this change was then demonstrated by inserting the bat prx1 traits in and among Drosophila species have revealed processes limb enhancer into the mouse gene, with the result that the of cis-regulatory sequence microevolution. Such processes forelimbs of the recipient mouse now develop with relatively account for variation in pigmentation patterns due to regulatory longer dimensions. In an essentially similar case, the tbx5 changes affecting expression of the yellow differentiation gene gene, deeply embedded in the vertebrate heart formation GRN (Gompel et al., 2005; Rokas and Carroll, 2006) and the ebony (for review, Davidson, 2006), turns out to be regulated differently differentiation gene (Rebeiz et al., 2009). Similarly, cis-regulatory during heart formation in reptiles than in birds and mammals,

976 Cell 144, March 18, 2011 ª2011 Elsevier Inc. a class-level difference. Expression of this gene is confined to the left ventricle in the developing amniote heart but is expressed across the common ventricle in the three-chambered reptile heart (Koshiba-Takeuchi et al., 2009). When uniform tbx5 expression is forced in the mouse heart, or left ventricle tbx5 expression is prevented, that is, if a reptilian tbx5 spatial regulatory expression is imposed, the mouse develops a three-chambered heart lacking an interventricular septum. Understanding of developmental GRN structure tells us that these examples differ from the foregoing in that they imply the existence of Box III GRN subcircuits in which the targeted genes participate. In contrast, in the peripheral gene examples above, the phenotype is wholly encompassed by changes in a single cis-regulatory system. Hox Gene Functions in Upper-Level GRN Patterning Systems Genes of the trans-bilaterian hox complexes have been the subject of a vast amount of phenomenological research, which has revealed the many and various effects on developmental morphology of hox gene knockouts or ectopic hox gene expression. The variety of effects precludes any simple interpretation of the functions of these genes in terms of developmental GRN structure, for the simple reason that they work at diverse levels. Studies of direct hox gene targets reveal both other regulatory genes and far downstream genes encoding proteins active in apoptosis, cell-cycle control, cell adhesion, cell polarity, non- canonical signaling, and cytoskeletal functions (Cobb and Duboule, 2005; Hueber and Lohmann, 2008; Pearson et al., 2005). However, Hox genes are most famous for their developmental effects on the placement and the internal organization of body parts. The most important evolutionary and developmental attributes of hox gene complex function can be reduced to two statements: first, in organisms in which coherent hox complexes exist they are expressed in a vectorial or sequential fashion with respect to the coordinates of the body plan or the body part; and second, they can act as switches that allow (or activate) GRN patterning subcircuits in given locations of the body plan or body part, or alternately they prohibit (or repress) these subcircuits in given locations. The genomic organization of hox gene clusters indicates that distinct mechanisms account for the locations in the body plan where individual hox genes are expressed in development. Figure 3. Hierarchy in Developmental Gene Regulatory Networks In Drosophila a plethora of cis-regulatory modules control each The diagram shows a symbolic representation of hierarchy in developmental gene regulatory networks. The developmental process begins with the onset aspect of expression of each gene. Particularly well-known at of embryogenesis at top. The outputs of the initial (i.e., pregastrular) embryonic the cis-regulatory level is the bithorax region (Ho et al., 2009; gene regulatory networks (GRNs) are used after gastrulation to set up the Maeda and Karch, 2009; Simon et al., 1990). Each specific hox GRNs, which establish regulatory states throughout the embryo, organized gene enhancer responds to local upstream regulatory states spatially with respect to the embryonic axes (axial organization and spatial subdivision are symbolized by orthogonal arrows and colored patterns). These that are the product of earlier developmental GRNs, just as in spatial domains divide the embryonic space into broad domains occupied by pluripotent cell populations already specified as mesoderm, endoderm, future brain, future axial neuroectoderm, non-neural ectoderm, etc. The GRNs oriented patterns of Box III. Because some body parts are ultimately of great establishing this initial mosaic of postgastrular regulatory states, including the complexity, the process of patterned subdivision and installation of succes- signaling interactions that help to establish domain boundaries, are symbol- sively more confined GRNs may be iterated, like a ‘‘do-loop,’’ symbolized here ized as Box I. Within Box I domains the progenitor fields for the future adult by the upwards arrow from Box III to Box II, labeled n R 1. Toward the body parts are later demarcated by signals plus local regulatory spatial termination of the developmental process in each region of the late embryo, information formulated in Box I, and given regulatory states are established in the GRNs specifying the several individual cell types and deployed in each each such field by the earliest body-part-specific GRNs. Many such progenitor subpart of each body part, are symbolized here as Box IV. Postembryonic fields are thus set up during postgastrular embryogenesis, and a GRN defining generation of specific cell types (from stem cells) is a Box IV process as well. one of these is here symbolized as Box II. Each progenitor field is then divided At the bottom of the diagram are indicated several differentiation gene up into the subparts that will together constitute the body part, where the batteries (‘‘DGB1, 2, 3’’), the final outputs of each cell type. Morphogenetic subdivisions are initially defined by installation of unique GRNs producing functions are also programmed in each cell type (not shown). For discussion unique regulatory states. These ‘‘sub-body part’’ GRNs are symbolized by the and background, see text and Davidson, 2001, 2006.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 977 any other developmental process. Similarly, many very well- Given these systems, deeply conserved and otherwise, by characterized cis-regulatory modules that control very specific which hox gene expression is regionally controlled, we come spatial and temporal aspects of anterior hox gene expression to their mode of interaction with the GRNs that control develop- are known in mammals, and often conserved to fish (Tumpel ment of specific body parts. Sometimes individual hox genes act et al., 2009). The prevalence of local cis-regulatory hox gene by participating, like any other regulatory genes in patterning control modules explains how these genes can function in GRNs, for example in early hindbrain specification, a Box I func- animals that lack large hox gene clusters. It is interesting that tion. Together with other important regulatory genes such as hox genes are not required for embryonic development of organ- krox and Kreisler, the anterior group hoxa and hoxb genes estab- isms that utilize fixed cell lineages for specification (Davidson, lish recursively wired, extremely conserved, rhombomere- 1990), for instance in C. elegans, which lacks both a coherent specific GRNs (Tumpel et al., 2007, 2009). But more often they hox complex and many hox genes (Aboobaker and Blaxter, operate in another, evolutionarily flexible way, such that change 2010); in sea urchins (Martinez et al., 1999); or in Ciona, which in their functions has been directly correlated, in many compar- also lacks a coherent hox complex (Ikuta et al., 2010). However, ative observations, with evolutionary change in both the posi- in addition to control by local enhancers, another entirely tioning and organization of body parts. different mechanism that speaks directly to both the evolutionary Not all body parts require the vectorial patterning function of maintenance of the hox gene cluster(s) and the vectorial expres- the hox gene complex, for example they are not expressed in sion of hox genes relative to one another has come to light in the midbrain or forebrain of vertebrates and they have nothing mammals and other tetrapods. to do with the specification of the extremely complex regional Over the last decade, transcriptional control of the mouse regulatory states installed during midbrain or forebrain develop- hoxd complex has been extensively examined by deletions, ment. Where vectorial inputs are required, hox genes intervene in rearrangements, and insertions of reporter transgenes, including local, mid-development, patterning functions (Box III). Here we ectopically positioned hox genes at various locations in the can rely on a number of specific examples. These are of imme- complex (Herault et al., 1998, 1999; Kmita et al., 2000; Spitz diate evolutionary significance in that the developmental et al., 2003; Tarchini and Duboule, 2006). To summarize very outcomes that they control vary sharply among related clades. briefly, early expression in the tetrapod limb bud is controlled For example, the tetrapod limb bud is a ‘‘new’’ evolutionary not only by local enhancers but also by distant regulatory regions invention, dating to the emergence of vertebrate forms onto located outside the hox gene clusters. One of these operates land. Development of the limb depends directly on deployment from the 30 (anterior) end of the cluster and causes the progres- of hox gene expression at several levels of the underlying sive expression of first anterior and then middle hox genes in the GRN. The early expression of 50 hox genes at the posterior limb bud region that will give rise to the forearm. Meanwhile the margin of the bud causes expression of the shh gene in these posterior hox genes are repressed by a counteracting locus cells, ultimately setting up anterior and posterior regulatory control region operating from beyond the 50 end of the complex states in the limb bud (Zakany et al., 2004). Posterior 50 hox in the anterior cells of the early limb bud, allowing expression of gene expression can be thought of as a switch activating the these genes only in the posterior limb bud cells. A second phase responsible circuitry. Later, during the autopod expression of hoxd expression is controlled by other complex distant phase, the GCR responds in turn to graded levels of Shh contrib- enhancers located 200 kb away from the 50 end of the cluster, uting to the nested pattern of hox gene expression in the auto- which are required to pattern the autopod region of the tetrapod pod, and the GCR can be thought of as a node in the patterning limb where the digits form (Tarchini and Duboule, 2006). This network. ‘‘global control region’’ (GCR) is responsible for a graded expres- Another example concerns the axial skeleton in vertebrates, sion of the five posterior hox genes across the anterior/posterior which vary greatly in the distribution of vertebral morphologies, (A/P) dimension of the autopod. The GCR probably had an again a developmental function of hox gene expression patterns. ancient role in controlling colinear expression in the central It is now possible to state just which sets of vertebrae require nervous system, a basal axial organization function that in terms hox5PG (paralog group), hox6PG, hox9PG, hox10PG, and of our Figure 3 would reside somewhere in Box I; part of the hox11PG (for review, Wellik, 2009). These relationships can all active GCR elements are conserved from fish to mammals. be interpreted in one simple way. For each type of vertebra However some limb-specific elements of the GCR likely evolved (cervical, rib-bearing thoracic, lumbar, etc.), there is a specific in tetrapods, particularly the autopod control device and its patterning GRN operating at the Box III level, and the products patterning GRN, which would make the autopod a novel of (often) two adjacent PGs allow it to be activated in the right evolutionary invention with respect to the fish antecedents place along the axis or may cause it to be activated ectopically (Gonzalez et al., 2007; Woltering and Duboule, 2010). More when these hox genes are activated ectopically. That is to say, generally, it is an interesting speculation that distant hox these hox genes act as regionally active switches that we can complex control regions were superimposed during chordate imagine sitting on the outside of the Boxes containing the evolution (they are absent from Drosophila), and control by local morphogenetic patterning GRNs. Switch behavior is particularly hox gene enhancers was the primal regulatory mode (Spitz easy to perceive when the switch acts negatively: thus the PG10 et al., 2001). However, because the regulatory landscape to hox genes prevent rib formation, normally used to preclude ribs which the local enhancers must respond can be very different on the lumbar vertebrae; if expressed ectopically no ribs form, in different organisms, they themselves must have evolved in and in complete loss of function ribs form almost everywhere clade-specific ways. (Carapuco et al., 2005; Vinagre et al., 2010; Wellik and Capecchi,

978 Cell 144, March 18, 2011 ª2011 Elsevier Inc. 2003). On the other hand, hox6PG genes promote rib formation. 2007; LaBeau et al., 2009). Throughout the body plan, hox genes The autonomy of these hox-driven switches, as shown by the control clade-specific deployment of organs and structures. complete ectopic production of one or another vertebral type So in summary, the common statement that hox genes in gain-of-function experiments, implies a useful evolutionary ‘‘pattern’’ this or that body part means that they provide negative mechanism for variation in axial skeletal proportions. Indeed, or positive cis-regulatory inputs into genes that are engaged in comparative observations show that different vertebrate classes the GRN circuits, which actually do the work of spatial patterning have hox spatial expression domains that correlate with the axial and body-part morphogenesis. Sometimes the hox gene inputs morphology (examples reviewed in Davidson, 2006). However, form part of the subcircuit itself as when there are feedback link- the most severe axial changes in tetrapod evolution, those ages between them and other regulatory genes, as in the later responsible for the body plans of snakes and reptiles, have limb bud or rhombomere specification circuitry cited above. involved more than merely upstream regulatory changes But in many more cases than those mentioned here the function affecting hox gene expression domains (Di-Poi et al., 2010; of these regionally expressed genes is rather to provide Woltering et al., 2009): in addition, the sequences of some of a one-way switch that provides ‘‘go’’ or ‘‘no go’’ instruction to the genes themselves have changed, regulatory linkages body-part-specific GRN patterning circuitry. In evolution the between gene expression and effects such as the hox10 deployment of these switches, and the linkages between them inhibition of rib formation have been broken, and numerous and the body-part-specific subcircuits, are far more flexible transposon insertions have altered the genomic structure of than is the internal structure of these subcircuits. Some of these the posterior hox cluster possibly affecting their spatial body-part-specific GRN structures are in evolutionary terms very regulation. ancient indeed. The mechanism by which the Drosophila ubx gene represses wing formation in the third thoracic (T3) segment provides the Conservation and Change in Developmental GRNs most explicit possible illustration of what it means for a hox The self-described field of ‘‘evo-devo’’ has generated enormous gene to intervene negatively and switch off a local patterning masses of descriptive spatial gene expression data, a frequent GRN. In the absence of Ubx function in T3, what should be the object of which is to show evolutionary ‘‘conservation’’ of devel- haltere imaginal disc produces a wing, hence Ed Lewis’ famous opmental gene use. Developmental gene use cannot truly be 4-winged fly (Bender et al., 1983). Thus Ubx function is repres- regarded as conserved unless the regulatory linkages sive with respect to the wing patterning GRN in the late T3 surrounding the genes in the GRN are conserved. Thus gene imaginal disc. The way this works is repression by Ubx and its expression data by themselves are a poor index of evolutionary cofactors of several genes of the wing GRN, as shown by conservation. Because negative results are uninformative, we analyses of Ubx clones in the haltere disc and of Ubx+ clones learn little of what has changed by looking only at what has in the wing disc (Galant et al., 2002; Weatherbee et al., 1998). not. Unless all forms were ‘‘sprung forth fully blown’’ like Athena These are direct cis-regulatory repressions. There are many from the head of Zeus, the evolution of the diverse body plans of arthropod examples not yet examined at the GRN level where animals requires large-scale processes of change in ancestral the mechanisms of hox gene function must be similar. In arthro- developmental GRN architecture. Furthermore, what is it that is pods the anterior boundaries of expression of the Ubx/Abd-A conserved: is it use of a given gene in a given developmental genes vary from class to class and sometimes among orders process? Is it use of a given gene in a given subcircuit in a given of the same class, e.g., among crustaceans, and this boundary process? Here we consider evolutionary conservation and is correlated with the type of appendage present on the evolutionary change, not of specific individual gene use, but of segment; from these correlations, Ubx evidently represses specific GRN circuitry. execution of the patterning GRN underlying development of Conservation feeding appendages (maxillipeds) and permits development of The hierarchical Linnean classification system we use, including locomotory thoracic appendages (Averof and Patel, 1997). This modern corrections based on molecular phylogenetics, es- inference has been demonstrated, by experimentally decreasing sentially arranges animal body plans on the basis of their evolu- or increasing Ubx expression in a shrimp that normally produces tionarily shared and derived characters (avoiding convergent one pair of maxillipeds, with the result of producing additional associations). Shared body plan characters of given clades pairs of these appendages or instead only thoracic legs, respec- ultimately imply conserved developmental regulatory circuitry tively (Liubicich et al., 2009; Pavlopoulos et al., 2009). Drosophila (Davidson and Erwin, 2009). But other apparently older charac- affords many further examples of hox gene switches that permit ters are shared over huge phylogenetic distances across or preclude regional morphogenetic GRN function in body part cladistic boundaries, being represented in multiple bilaterian formation, among the most convincing of which is in heart phyla and in diverse body plans. These are particular body parts, development (Lo et al., 2002). Further examples of regional hox such as hearts, and the major domains of brains, and particular gene control of specific body part identity by cis-regulatory intercell types, such as muscle and neurons. vention are in somatic muscle pair specification. Each muscle Because of their very widespread distribution, some differen- develops from founder cells expressing specific transcription tiation gene batteries are probably among the oldest features of factors, i.e., a specific regulatory state (Baylies et al., 1998). modern developmental GRNs (Davidson, 2006; Davidson and There are direct hox gene inputs into this process, for example, Erwin, 2009). But just as a cell type is not the same thing as the alary muscles that connect the aorta of the heart and that a body part, so a differentiation gene battery is not the same require Ubx and AbdA for their development (Dubois et al., thing as a cell type. During evolution the identity of the effector

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 979 genes can change radically, whereas the biological function of stream effector genes are clade specific, whereas the definitive the cell type remains the same; and in addition, the cell type often upstream regulatory states are conserved across clades. has cell-biological or -morphological characteristics that are not Immune cells provide the most evidence, for as knowledge of encoded the same way as is activation of sets of effector genes. the diverse strategies for immune response, both adaptive and So we have to consider what GRN structures actually lie at the nonadaptive, extends beyond mammals, an amazing variety of root of trans-phyletic cell-type conservation. A few examples effector genes is revealed but the same familiar sets of regulatory may clarify this issue. genes are found to control their expression. Lampreys, for We know many cell types that are present in many types of example, have the equivalent of T cells and B cells but instead animals, the specific properties of which depend on conserved of somatically reassembled T- and B-immunoglobulin receptors, differentiation gene batteries including both conserved down- they express somatically reassembled variable leucine-rich stream regulatory states and effector genes. For example, repeat receptors (Guo et al., 2009; Herrin and Cooper, 2010). everyone is familiar with pan-eumetazoan (cnidarian plus bilater- Yet the T cell-like lamprey cell regulatory state includes factors ian) conservation of striated and smooth muscle. Here the encoded by familiar T cell genes, such as bc11b, gata2/3, distinctive cellular morphology, the function, and, underlying c- and rel; and like T cells, their development depends on Notch these, the regulatory state consisting of myogenic bHLH factors signaling. In Drosophila, the pathogen-activated innate immune and MEF2, plus downstream effector genes exemplified by the response, which deploys a number of antimicrobial effector myosin heavy chain contractile protein, are all conserved (Seipel molecules, depends, as does much of our very different innate and Schmid, 2005). immune response, on inducible regulatory factors of the NF-kB The same is true of neuronal cell types (e.g., Hayakawa et al., family (Hoffmann, 2003). And sea urchins, which employ 2004). There are many additional examples in which both regu- a surprising and unique repertoire of hundreds of receptors of latory state and effector genes are evidently conserved. several different classes in their dedicated immune cells (Hibino A comparison between vertebrate and annelid light-sensitive et al., 2006; Messier-Solek et al., 2010), express in these cells nonocular neurosecretory cell types that produce vasotocin a regulatory state very familiar to students of mammalian hema- (vasopressin-neurophysin) as well as opsin provides a striking topoietic systems, such as the factors encoded by the scl, e2a, case (Tessmar-Raible et al., 2007). This cell type is located in gata1/2/3, ikaros, and runx genes, and even a pu.1-like ets family the forebrains of both a polychaete annelid and zebrafish, as gene. are also very similar chemosensory neurosecretory cell types Another system in which a conserved cell-type-specific regu- that produce RF-amide. The vasotocinergic cells of both verte- latory state controls entirely different effector genes, which brate and annelid express similar (Box IV) regulatory states, nonetheless execute the same function, is found in the cells generated by the nk2.1, rx, and otp genes, as well as a gene that in development create the outer epidermal barrier against producing the miR-7 microRNA that is also, in both organisms, the external world, and which recreate this barrier in wound expressed in the RF-amidonergic cells. Vasotocinergic neurose- repair. In vertebrates the barrier is composed of a mixture of cretory cells were probably pan-bilaterian cell types, though crosslinked keratins of diverse kinds, matrix proteins, lipids, genes encoding vasotocin have been lost in (sequenced) ecdy- special cornified membrane proteins, etc; in insects it is sozoan lineages. Ocular photoreceptor cells provide another composed of crosslinked chitins, plus other proteins and lipids. example of a pan-bilaterian cell type in which the Box IV GRN The structures are entirely nonhomologous in molecular identity. controlling the various subtypes of receptors (rhabdomeric In mammals wound repair requires expression (among other receptors in insects, and rods and cones in vertebrates) operate proteins) of a crosslinking transglutaminase, whereas in downstream of regulatory genes of the K50 homeodomain family Drosophila it requires expression of dopa decarboxylase and (Mishra et al., 2010; Ranade et al., 2008). These genes are otx2 tyrosine hydroxylase, which generates quinones that crosslink and crx in mammals (Corbo et al., 2010; Hennig et al., 2008) chitin and cuticle proteins (Pearson et al., 2009; Ting et al., and otd in Drosophila (where a paired class regulatory gene, 2005). But in both flies and mice these functions are directly pph3, which binds to the same sites as does Pax6, is also utilized regulated by genes of the grainyhead family, which encodes in regulation of the same target genes). The transcription factors transcription factors that utilize a unique DNA-binding domain encoded by otd,orcrx and otx2, directly activate the cis-regula- (also found in fungi), plus other factors of the jun/fos family. tory control systems of the genes encoding the photoreceptor The Box IV cell-type GRNs are conserved, but the effector genes pigments in flies and mice. In addition, the targets of these regu- are entirely diverse. latory genes, in both flies and mice, include phototransduction So we see that ancient cell-type-specific functions, which genes (rhodopsins, transducins, phosphodiesterase genes, were utilized in the lineage ancestral to all bilaterians, are essen- arrestins) and cell morphogenesis genes (Ranade et al., 2008). tially defined by specific regulatory states, that is to say by The mammalian Box IV crx/otx2 GRN includes a canonical set genomically encoded GRN cassettes that produce cell-type- of six other regulatory genes, interactions among which in specific regulatory states. Sometimes the effector gene sets mammals determine the photoreceptor subtype (Hennig et al., that these regulatory states animate are at least partially 2008; Swaroop et al., 2010). That is, in these cell types both conserved, sometimes not. In evolutionary terms, the genomic downstream effector genes and their immediate regulatory repository of basic bilaterian (or eumetazoan) functions such apparatus are deployed in a manner that is widely conserved. as immunity, wound repair, contraction, and photoreception But there is another, profoundly interesting pattern of conser- was built into these cell-type-specific regulatory cassettes, and vation displayed by pan-bilaterian cell types, in which the down- they have ever since retained their identity.

980 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Some body parts are also conserved across the cladistic Remarkable examples of architectural GRN rewiring have boundaries. This implies that there is something in the genetic come to light in comparisons of the segmentation GRNs of programs for development of these body parts that is also various insects. Given that the short germ band mode of develop- conserved. However, in cases where the final structures are ment appears to be pleisiomorphic for insects and their sister diverse, and develop via very diverse pattern formation and group the crustaceans, the linkages seen in the early A/P morphogenetic mechanisms, it may be that only the Box II patterning GRN of Tribolium, for example, may be closer to the GRN circuits encoding the initial establishment of the progenitor ancestral linkages than the derived linkages of Drosophila field from which the body part will be built are conserved, plus GRNs. This is supported by a vast literature on many other the final deployment of conserved cell types. Comparative insects and crustaceans as well (see following citations for GRN analysis is beginning to reveal ‘‘kernels’’ (Davidson and references to work on other species). Every major aspect of Erwin, 2006), in which regulatory genes wired together in certain A/P patterning analyzed at the gene interaction level appears to conserved linkages execute upstream regulatory functions include some different linkages in Drosophila compared to in development of given body parts. These circuits are charac- Tribolium. For example, it had been thought that the absence terized by extensive feedback wiring, and where tested, of the bicoid gene outside of higher Diptera was compensated interference with expression of any of their genes results in in other groups by a similar function of the anteriorly expressed developmental catastrophe. These features, and developmental otd gene, which encodes a regulator with a Bcd-like homeodo- canalization due to the upstream position of such kernels in the main and target specificity. But recent work shows that in body-part GRN, explain their exceptional evolutionary conserva- Tribolium otd functions very differently from bcd in Drosophila, tion. Examples include what may be a pan-bilaterian (i.e., from in that it operates through different downstream linkages flies to mice) kernel for heart specification (Davidson, 2006) (Kotkamp et al., 2010). Unlike bcd in Drosophila, it controls and an (at least) pan-echinoderm kernel underlying mesoderm dorsal/ventral (D/V) patterning, by repressing sog expression; it specification in both sea urchin and sea star development affects zen expression; and it contributes no spatial A/P input (McCauley et al., 2010) (these lineages have not shared to the patterning process. Similarly, it is clear that some of the a common ancestor since the end of the Cambrian). Similarly, GRN wiring downstream of the hunchback gene differs, for in a fundamental Box II subcircuit may underlie mesoderm specifi- Tribolium hb apparently does not directly control primary pair- cation in vertebrate embryogenesis (Swiers et al., 2010). A recur- rule genes and does not repress but rather activates giant sively wired triple feedback circuit has been proposed as a kernel (Choe et al., 2006; Marques-Souza et al., 2008). On the other underlying the pluripotent state of endothelial/hematopoietic hand, in both species hb sets the anterior boundary of Ubx precursors that arise in vertebrate development (Pimanda expression and provides an activating input into the kruppel et al., 2007). There are also many less coherent observations, regulatory system (Marques-Souza et al., 2008). The architecture not yet at the level of an explicit GRN, in which detailed of pair-rule GRNs in Drosophila and Tribolium, which are patterning similarities plus some gene interaction data strongly composed of largely the same genes, is very different (Choe suggest the existence of GRN kernels that yet await elucidation. and Brown, 2009; Jaynes and Fujioka, 2004), but, amazingly, One convincing example is the brain, where a large amount of they generate the same downstream outcomes, the expression work has illuminated striking similarities in both A/P and medio- of wg and en across each parasegment border. Some linkages lateral patterns of regulatory gene expression as well as homol- in the pair-rule GRNs are the same, but many are entirely rewired: ogous gene interactions between Drosophila and mouse (David- for instance, eve directly represses wg in Tribolium whereas eve son, 2006; Denes et al., 2007; Lowe et al., 2003; Seibert and indirectly represses en in Drosophila (via slp; Choe and Brown, Urbach, 2010; Tessmar-Raible et al., 2007). 2009). Upstream of this, in Tribolium is a possibly pleisiomorphic Evolutionary Change in GRN Architecture kernel-like segmentation subcircuit, consisting of mutual inter- Evolutionary rewiring of GRN architecture by means of cis-regu- connections among eve, runt, and odd, which runs sequentially latory co-optive change of given linkages among regulatory to pattern the forming segments (Choe et al., 2006). Another genes is the most common upper-level evolutionary mechanism example of extensive rewiring among the same genes engaged by which developmental process is altered. That is, the GRN of in the same developmental process since divergence between a common ancestor is the source structure for diverse alter- Coleoptera and Diptera is in eye specification. A comparison ations in that structure in the derived descendants. But of between the relatively well-known eye specification GRN of course, not all parts of the structure are equally accessible to Drosophila (for reviews, Friedrich, 2006; Kumar, 2009) with that change, for the reasons we have tried here to point out. Some- governing larval and adult eye specification in Tribolium (Yang times the contrast between the conserved and nonconserved et al., 2009) displays remarkable differences. The genes at the parts of a given GRN is quite dramatic, as in a comparison top of the Drosophila hierarchy, e.g., toy and ey (pax6 orthologs), between sea urchin and sea star endomesodermal GRNs are not even needed for adult eye development in Tribolium, that revealed an extremely conserved, five-gene subcircuit, where another gene in the same network, dachshund, operates surrounded by linkages not one of which had survived the redundantly with pax6, rather than being located downstream half-billion years since divergence without change (Hinman of the pax6 genes in the network. and Davidson, 2007). This was an inter-class comparison; for visualization of the process of evolutionary GRN rewiring the Subcircuit Co-option inter-ordinal comparison between developmental GRNs in Considering what we know of how developmental GRNs are Drosophila and Tribolium is illuminating. constructed, it is not surprising that successful developmental

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 981 programs are used repeatedly, plugged into various positions Conclusions in the GRN hierarchy. One example of a subcircuit-level co- To make sense of the physical mechanisms that underlie the option event has been discovered in sea urchins, in which origin of animal body plans (Davidson and Erwin, 2009), we a class-specific evolutionary modification has caused the must consider how change in DNA sequence can affect develop- acquisition of an embryonic skeleton not present in other echi- ment of the body plan at the system level. For development of the noderms. The shared feature of echinoderms is an endoskel- body plan is a heritable regulatory system process, which we can eton in the adult organism. A comparison of regulatory gene represent and manipulate and comprehend only in terms of expression in embryonic and adult skeletogenic precursor cells genomically encoded GRN architecture. The evidence that revealed a large overlap at least at the nodes of these GRNs, comes to us from evo-devo comparisons of gene expression which very likely extends also to the linkages between them patterns, from detailed studies of regulatory changes in single (Gao and Davidson, 2008; Oliveri et al., 2008). Regulatory genes, from direct comparative GRN analysis, from evolutionary genes exclusive to the embryonic GRN are those determining conservation and evolutionary innovation, and from the fossil the embryonic location in which the skeletogenic GRN is acti- record can only be integrated in a mechanistic way by resolving vated. Thus, by modifying probably only a small number of the meaning of this evidence in terms of its import for develop- cis-regulatory sequences, the skeletogenic GRN subcircuit is mental GRN architecture. This is the path to demystification of redeployed such that it is activated both in the embryo and in body-plan evolution. This project cannot be approached, except the adult, most likely by use of the exact same genomic in an indirect exemplary sense, by looking at change in single cis- sequences. regulatory modules or single proteins, nor in ignorance of the A similar interpretation has been applied to the apparent regulatory gene interactions that constitute the architecture of conservation of proximodistal patterning mechanisms in developmental GRNs. The theory of evolution by change in entirely nonhomologous bilaterian appendages (Lemons et al., GRN architecture also generates the path to experimental 2010). Thus, Drosophila and vertebrate leg progenitor fields validation of evolutionary process by synthetic changes in devel- express the same set of regulatory genes in the same sequen- opmental GRNs. This approach is already beginning to be tial order along the proximodistal axis, although as a result of applied, as we review above. The genomic control of the devel- different regulatory interactions. Interestingly, this very same opmental process itself can only be understood in terms of the sequence of regulatory gene expressions is observed also genomic regulatory system, and so must time-based change in along the anteroposterior axis in the head neuroectoderm of that regulatory system, the basis of body-plan evolution. Drosophila and Saccoglossus, a hemichordate lacking appendages. McGinnis et al. therefore propose that the similarity of ACKNOWLEDGMENTS patterning observed in these nonhomologous body parts might be the result of independent co-options of a subcircuit with We gratefully acknowledge support for this work from NIH Grant HD-037105 conserved function (Lemons et al., 2010). A relatively recent and from NSF Grant IOS-0641398. co-option of an entire body part has occurred in teleost fish, resulting in the formation of a secondary jaw in the same loca- REFERENCES tion where ancient pharyngeal teeth developed (Fraser et al., 2009). Malawi cichlids, which possess both oral and pharyn- Aboobaker, A., and Blaxter, M. (2010). The nematode story: Hox gene loss and geal jaws and teeth, show very similar expression of signaling rapid evolution. Adv. Exp. Med. Biol. 689, 101–110. molecules and transcription factors in tooth-forming cells in Abzhanov, A., Protas, M., Grant, B.R., Grant, P.R., and Tabin, C.J. (2004). both locations, supporting the hypothesis that tooth de- Bmp4 and morphological variation of beaks in Darwin’s finches. Science velopment in oral and pharyngeal jaws is driven by the same 305, 1462–1465. tooth GRN. However, one substantial difference exists, which Abzhanov, A., Kuo, W.P., Hartmann, C., Grant, B.R., Grant, P.R., and Tabin, is the expression of a set of hox genes in the pharyngeal but C.J. (2006). The calmodulin pathway and evolution of elongated beak morphology in Darwin’s finches. Nature 442, 563–567. not the oral jaw. In mouse pharyngeal arches, hox genes are expressed in all but the first pharyngeal arch, which gives Averof, M., and Patel, N.H. (1997). Crustacean appendage evolution associated with changes in Hox gene expression. Nature 388, 682–686. rise to the oral jaw. Mutation of genes in the hoxa cluster results in formation of ectopic jaw-like skeletal structures, and Balhoff, J.P., and Wray, G.A. (2005). Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites. hox genes are therefore thought to prevent the development Proc. Natl. Acad. Sci. USA 102, 8591–8596. of jaws in caudal pharyngeal arches (Minoux et al., 2009). In Baumer, N., Marquardt, T., Stoykova, A., Ashery-Padan, R., Chowdhury, K., other words, the jaw-forming patterning GRN would be and Gruss, P. (2002). Pax6 is required for establishing naso-temporal and expressed in more posterior pharyngeal arches were it not for dorsal characteristics of the optic vesicle. Development 129, 4535–4545. the repressive hox gene switch. Therefore, if the same was Baylies, M.K., Bate, M., and Ruiz Gomez, M. (1998). Myogenesis: A view from true in the teleost ancestor, a prerequisite for the evolutionary Drosophila. Cell 93, 921–927. co-option of the jaw GRN to a causal pharyngeal arch in cich- Bender, W., Akam, M., Karch, F., Beachy, P.A., Peifer, M., Spierer, P., Lewis, lids would have been the uncoupling of this developmental E.B., and Hogness, D.S. (1983). Molecular genetics of the bithorax complex in program from the repressive hox input. These examples may Drosophila melanogaster. Science 221, 23–29. display the co-optive redeployment of developmental GRNs Bolouri, H., and Davidson, E.H. (2010). The gene regulatory network basis of and the switch-like function of hox genes in controlling spatial the ‘‘community effect,’’ and analysis of a sea urchin embryo example. Dev. utilization of GRNs. Biol. 340, 170–178.

982 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Britten, R.J. (1997). Mobile elements inserted in the distant past have taken on Erwin, D.H., and Davidson, E.H. (2009). The evolution of hierarchical gene important functions. Gene 205, 177–182. regulatory networks. Nat. Rev. Genet. 10, 141–148. Britten, R.J., and Davidson, E.H. (1971). Repetitive and non-repetitive DNA Fraser, G.J., Hulsey, C.D., Bloomquist, R.F., Uyesugi, K., Manley, N.R., and sequences and a speculation on the origins of evolutionary novelty. Q. Rev. Streelman, J.T. (2009). An ancient gene network is co-opted for teeth on old Biol. 46, 111–138. and new jaws. PLoS Biol. 7, e31. Cameron, R.A., and Davidson, E.H. (2009). Flexibility of transcription factor Friedrich, M. (2006). Ancient mechanisms of visual sense organ development target site position in conserved cis-regulatory modules. Dev. Biol. 336, based on comparison of the gene networks controlling larval eye, ocellus, and 122–135. compound eye specification in Drosophila. Arthropod Struct. Dev. 35, Carapuco, M., Novoa, A., Bobola, N., and Mallo, M. (2005). Hox genes specify 357–378. vertebral types in the presomitic mesoderm. Genes Dev. 19, 2116–2121. Galant, R., Walsh, C.M., and Carroll, S.B. (2002). Hox repression of a target Chan, Y.F., Marks, M.E., Jones, F.C., Villarreal, G., Jr., Shapiro, M.D., Brady, gene: extradenticle-independent, additive action through multiple monomer 129 S.D., Southwick, A.M., Absher, D.M., Grimwood, J., Schmutz, J., et al. binding sites. Development , 3115–3126. (2010). Adaptive evolution of pelvic reduction in sticklebacks by recurrent Gao, F., and Davidson, E.H. (2008). Transfer of a large gene regulatory appa- deletion of a Pitx1 enhancer. Science 327, 302–305. ratus to a new developmental address in echinoid evolution. Proc. Natl. Acad. 105 Choe, C.P., and Brown, S.J. (2009). Genetic regulation of engrailed and wing- Sci. USA , 6091–6096. less in Tribolium segmentation and the evolution of pair-rule segmentation. Garza, D., Medhora, M., Koga, A., and Hartl, D.L. (1991). Introduction of the Dev. Biol. 325, 482–491. transposable element mariner into the germline of Drosophila melanogaster. 128 Cobb, J., and Duboule, D. (2005). Comparative analysis of genes downstream Genetics , 303–310. of the Hoxd cluster in developing digits and external genitalia. Development Gogvadze, E., and Buzdin, A. (2009). Retroelements and their impact on 132, 3055–3067. genome evolution and functioning. Cell. Mol. Life Sci. 66, 3727–3742. Choe, C.P., Miller, S.C., and Brown, S.J. (2006). A pair-rule gene circuit defines Gompel, N., Prud’homme, B., Wittkopp, P.J., Kassner, V.A., and Carroll, S.B. segments sequentially in the short-germ insect Tribolium castaneum. Proc. (2005). Chance caught on the wing: cis-regulatory evolution and the origin of Natl. Acad. Sci. USA 103, 6560–6564. pigment patterns in Drosophila. Nature 433, 481–487. Cole, N.J., Tanaka, M., Prescott, A., and Tickle, C. (2003). Expression of limb Gonzalez, F., Duboule, D., and Spitz, F. (2007). Transgenic analysis of Hoxd initiation genes and clues to the morphological diversification of threespine gene regulation during digit development. Dev. Biol. 306, 847–859. stickleback. Curr. Biol. 13, R951–R952. Gross, J.B., Borowsky, R., and Tabin, C.J. (2009). A novel role for Mc1r in the Corbo, J.C., Lawrence, K.A., Karlstetter, M., Myers, C.A., Abdelaziz, M., parallel evolution of depigmentation in independent populations of the cave- Dirkes, W., Weigelt, K., Seifert, M., Benes, V., Fritsche, L.G., et al. (2010). fish Astyanax mexicanus. PLoS Genet. 5, e1000326. CRX ChIP-seq reveals the cis-regulatory architecture of mouse photorecep- Guo, P., Hirano, M., Herrin, B.R., Li, J., Yu, C., Sadlonova, A., and Cooper, tors. Genome Res. 20, 1512–1525. M.D. (2009). Dual nature of the adaptive immune system in lampreys. Nature Cretekos, C.J., Wang, Y., Green, E.D., Martin, J.F., Rasweiler, J.J.t., and 459, 796–801. Behringer, R.R. (2008). Regulatory divergence modifies limb length between Hare, E.E., Peterson, B.K., Iyer, V.N., Meier, R., and Eisen, M.B. (2008). Sepsid mammals. Genes Dev. 22, 141–151. even-skipped enhancers are functionally conserved in Drosophila despite lack Davidson, E.H. (1990). How embryos work: a comparative view of diverse of sequence conservation. PLoS Genet. 4, e1000106. modes of cell fate specification. Development 108, 365–389. Hayakawa, E., Fujisawa, C., and Fujisawa, T. (2004). Involvement of Hydra Davidson, E.H. (2001). Genomic Regulatory Systems: Development and achaete-scute gene CnASH in the differentiation pathway of sensory neurons Evolution (San Diego, CA: Acdemic Press). in the tentacles. Dev. Genes Evol. 214, 486–492. Davidson, E.H. (2006). The Regulatory Genome. Gene Regulatory Networks in Hennig, A.K., Peng, G.H., and Chen, S. (2008). Regulation of photoreceptor Development and Evolution (San Diego, CA: Academic Press/Elsevier). gene expression by Crx-associated transcription factor network. Brain Res. Davidson, E.H., and Erwin, D.H. (2006). Gene regulatory networks and the 1192, 114–133. evolution of animal body plans. Science 311, 796–800. Herault, Y., Beckers, J., Kondo, T., Fraudeau, N., and Duboule, D. (1998). Davidson, E.H., and Erwin, D.H. (2009). An integrated view of precambrian Genetic analysis of a Hoxd-12 regulatory element reveals global versus local eumetazoan evolution. Cold Spring Harb. Symp. Quant. Biol. 74, 65–80. modes of controls in the HoxD complex. Development 125, 1669–1677. Davidson, E.H., and Erwin, D.H. (2010). Evolutionary innovation and stability in Herault, Y., Beckers, J., Gerard, M., and Duboule, D. (1999). Hox gene expres- animal gene networks. J. Exp. Zoolog. B Mol. Dev. Evol. 314, 182–186. sion in limbs: colinearity by opposite regulatory controls. Dev. Biol. 208, 157–165. Denes, A.S., Jekely, G., Steinmetz, P.R., Raible, F., Snyman, H., Prud’homme, B., Ferrier, D.E., Balavoine, G., and Arendt, D. (2007). Molecular architecture of Herpin, A., Braasch, I., Kraeussling, M., Schmidt, C., Thoma, E.C., Nakamura, annelid nerve cord supports common origin of nervous system centralization S., Tanaka, M., and Schartl, M. (2010). Transcriptional rewiring of the sex in bilateria. Cell 129, 277–288. determining dmrt1 gene duplicate by transposable elements. PLoS Genet. 6 Dermitzakis, E.T., Bergman, C.M., and Clark, A.G. (2003). Tracing the , e1000844. evolutionary history of Drosophila regulatory regions with models that identify Herrin, B.R., and Cooper, M.D. (2010). Alternative adaptive immunity in jawless transcription factor binding sites. Mol. Biol. Evol. 20, 703–714. vertebrates. J. Immunol. 185, 1367–1374. Di-Poi, N., Montoya-Burgos, J.I., Miller, H., Pourquie, O., Milinkovitch, M.C., Hibino, T., Loza-Coll, M., Messier, C., Majeske, A.J., Cohen, A.H., Terwilliger, and Duboule, D. (2010). Changes in Hox genes’ structure and function during D.P., Buckley, K.M., Brockton, V., Nair, S.V., Berney, K., et al. (2006). The the evolution of the squamate body plan. Nature 464, 99–103. immune gene repertoire encoded in the purple sea urchin genome. Dev. 300 Dubois, L., Enriquez, J., Daburon, V., Crozet, F., Lebreton, G., Crozatier, M., Biol. , 349–365. and Vincent, A. (2007). Collier transcription in a single Drosophila muscle Hinman, V.F., and Davidson, E.H. (2007). Evolutionary plasticity of develop- lineage: the combinatorial control of muscle identity. Development 134, mental gene regulatory network architecture. Proc. Natl. Acad. Sci. USA 4347–4355. 104, 19404–19409. Elgar, G., and Vavouri, T. (2008). Tuning in to the signals: noncoding sequence Ho, M.C., Johnsen, H., Goetz, S.E., Schiller, B.J., Bae, E., Tran, D.A., Shur, conservation in vertebrate genomes. Trends Genet. 24, 344–352. A.S., Allen, J.M., Rau, C., Bender, W., et al. (2009). Functional evolution of

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 983 cis-regulatory modules at a homeotic gene in Drosophila. PLoS Genet. 5, in hemichordates and the origins of the chordate nervous system. Cell 113, e1000709. 853–865. Hoffmann, J.A. (2003). The immune response of Drosophila. Nature 426, Ludwig, M.Z., Bergman, C., Patel, N.H., and Kreitman, M. (2000). Evidence for 33–38. stabilizing selection in a eukaryotic enhancer element. Nature 403, 564–567. Hong, J.W., Hendrix, D.A., Papatsenko, D., and Levine, M.S. (2008). How the Maeda, R.K., and Karch, F. (2009). The bithorax complex of Drosophila an Dorsal gradient works: insights from postgenome technologies. Proc. Natl. exceptional Hox cluster. Curr. Top. Dev. Biol. 88, 1–33. Acad. Sci. USA 105, 20072–20076. Marques-Souza, H., Aranda, M., and Tautz, D. (2008). Delimiting the Hueber, S.D., and Lohmann, I. (2008). Shaping segments: Hox gene function in conserved features of hunchback function for the trunk organization of insects. the genomic age. Bioessays 30, 965–979. Development 135, 881–888. Ikuta, T., Satoh, N., and Saiga, H. (2010). Limited functions of Hox genes in the Martinez, P., Rast, J.P., Arenas-Mena, C., and Davidson, E.H. (1999). Organi- larval development of the ascidian Ciona intestinalis. Development 137, 1505– zation of an echinoderm Hox gene cluster. Proc. Natl. Acad. Sci. USA 96, 1513. 1469–1474. Jaynes, J.B., and Fujioka, M. (2004). Drawing lines in the sand: even skipped McCauley, B.S., Weideman, E.P., and Hinman, V.F. (2010). A conserved gene et al. and parasegment boundaries. Dev. Biol. 269, 609–622. regulatory network subcircuit drives different developmental fates in the Jeffery, W.R. (2005). Adaptive evolution of eye degeneration in the Mexican vegetal pole of highly divergent echinoderm embryos. Dev. Biol. 340, 200–208. blind cavefish. J. Hered. 96, 185–196. McGregor, A.P., Orgogozo, V., Delon, I., Zanet, J., Srinivasan, D.G., Payre, F., Jeffery, W.R. (2009). Regressive evolution in Astyanax cavefish. Annu. Rev. and Stern, D.L. (2007). Morphological evolution through multiple cis-regulatory Genet. 43, 25–47. mutations at a single gene. Nature 448, 587–590. Jimenez-Delgado, S., Pascual-Anaya, J., and Garcia-Fernandez, J. (2009). Messier-Solek, C., Buckley, K.M., and Rast, J.P. (2010). Highly diversified Implications of duplicated cis-regulatory elements in the evolution of meta- innate receptor systems and new forms of animal immunity. Semin. Immunol. zoans: the DDI model or how simplicity begets novelty. Brief. Funct. Genomics 22, 39–47. 8 Proteomics , 266–275. Miller, C.T., Beleza, S., Pollen, A.A., Schluter, D., Kittles, R.A., Shriver, M.D., Johnson, R., Samuel, J., Ng, C.K., Jauch, R., Stanton, L.W., and Wood, I.C. and Kingsley, D.M. (2007). cis-Regulatory changes in Kit ligand expression (2009). Evolution of the vertebrate gene regulatory network controlled by the and parallel evolution of pigmentation in sticklebacks and humans. Cell 131, transcriptional repressor REST. Mol. Biol. Evol. 26, 1491–1507. 1179–1189. Kazazian, H.H., Jr. (2004). Mobile elements: drivers of genome evolution. Minoux, M., Antonarakis, G.S., Kmita, M., Duboule, D., and Rijli, F.M. (2009). Science 303, 1626–1632. Rostral and caudal pharyngeal arches share a common neural crest ground 136 Kmita, M., van Der Hoeven, F., Zakany, J., Krumlauf, R., and Duboule, D. pattern. Development , 637–645. (2000). Mechanisms of Hox gene colinearity: transposition of the anterior Mishra, M., Oke, A., Lebel, C., McDonald, E.C., Plummer, Z., Cook, T.A., and Hoxb1 gene into the posterior HoxD complex. Genes Dev. 14, 198–211. Zelhof, A.C. (2010). Pph13 and orthodenticle define a dual regulatory pathway 137 Koshiba-Takeuchi, K., Mori, A.D., Kaynak, B.L., Cebra-Thomas, J., Sukonnik, for photoreceptor cell morphogenesis and function. Development , 2895– T., Georges, R.O., Latham, S., Beck, L., Henkelman, R.M., Black, B.L., et al. 2904. (2009). Reptilian heart development and the molecular basis of cardiac Oda-Ishii, I., Bertrand, V., Matsuo, I., Lemaire, P., and Saiga, H. (2005). Making chamber evolution. Nature 461, 95–98. very similar embryos with divergent genomes: conservation of regulatory Kotkamp, K., Klingler, M., and Schoppmeier, M. (2010). Apparent role of Tribo- mechanisms of Otx between the ascidians Halocynthia roretzi and Ciona in- 132 lium orthodenticle in anteroposterior blastoderm patterning largely reflects testinalis. Development , 1663–1674. novel functions in dorsoventral axis formation and cell survival. Development Odom, D.T., Dowell, R.D., Jacobsen, E.S., Gordon, W., Danford, T.W., MacI- 137, 1853–1862. saac, K.D., Rolfe, P.A., Conboy, C.M., Gifford, D.K., and Fraenkel, E. (2007). Kumar, J.P. (2009). The molecular circuitry governing retinal determination. Tissue-specific transcriptional regulation has diverged significantly between Biochim. Biophys. Acta 1789, 306–314. human and mouse. Nat. Genet. 39, 730–732. LaBeau, E.M., Trujillo, D.L., and Cripps, R.M. (2009). Bithorax complex genes Ohno S., ed. (1970). Evolution by Gene Duplication (New York: Springer-Ver- control alary muscle patterning along the cardiac tube of Drosophila. Mech. lag). Dev. 126, 478–486. Ohshima, K., Hattori, M., Yada, T., Gojobori, T., Sakaki, Y., and Okada, N. Lemons, D., Fritzenwanker, J.H., Gerhart, J., Lowe, C.J., and McGinnis, W. (2003). Whole-genome screening indicates a possible burst of formation of (2010). Co-option of an anteroposterior head axis patterning system for prox- processed pseudogenes and Alu repeats by particular L1 subfamilies in imodistal patterning of appendages in early bilaterian evolution. Dev. Biol. 344, ancestral primates. Genome Biol. 4, R74. 358–362. Oliveri, P., Tu, Q., and Davidson, E.H. (2008). Global regulatory logic for spec- Liberman, L.M., and Stathopoulos, A. (2009). Design flexibility in cis-regulatory ification of an embryonic cell lineage. Proc. Natl. Acad. Sci. USA 105, 5955– control of gene expression: synthetic and comparative evidence. Dev. Biol. 5962. 327, 578–589. Ostertag, E.M., and Kazazian, H.H., Jr. (2001). Biology of mammalian L1 retro- Liubicich, D.M., Serano, J.M., Pavlopoulos, A., Kontarakis, Z., Protas, M.E., transposons. Annu. Rev. Genet. 35, 501–538. Kwan, E., Chatterjee, S., Tran, K.D., Averof, M., and Patel, N.H. (2009). Knock- Parker, H.G., VonHoldt, B.M., Quignon, P., Margulies, E.H., Shao, S., Mosher, down of Parhyale Ultrabithorax recapitulates evolutionary changes in crusta- D.S., Spady, T.C., Elkahloun, A., Cargill, M., Jones, P.G., et al. (2009). An ex- 106 cean appendage morphology. Proc. Natl. Acad. Sci. USA , 13892–13896. pressed fgf4 retrogene is associated with breed-defining chondrodysplasia in Lo, P.C., Skeath, J.B., Gajewski, K., Schulz, R.A., and Frasch, M. (2002). domestic dogs. Science 325, 995–998. Homeotic genes autonomously specify the anteroposterior subdivision of Pavlopoulos, A., Kontarakis, Z., Liubicich, D.M., Serano, J.M., Akam, M., Patel, 251 the Drosophila dorsal vessel into aorta and heart. Dev. Biol. , 307–319. N.H., and Averof, M. (2009). Probing the evolution of appendage specialization Logan, M., and Tabin, C.J. (1999). Role of Pitx1 upstream of Tbx4 in specifica- by Hox gene misexpression in an emerging model crustacean. Proc. Natl. tion of hindlimb identity. Science 283, 1736–1739. Acad. Sci. USA 106, 13897–13902. Lowe, C.J., Wu, M., Salic, A., Evans, L., Lander, E., Stange-Thomann, N., Pearson, J.C., Lemons, D., and McGinnis, W. (2005). Modulating Hox gene Gruber, C.E., Gerhart, J., and Kirschner, M. (2003). Anteroposterior patterning functions during animal body patterning. Nat. Rev. Genet. 6, 893–904.

984 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Pearson, J.C., Juarez, M.T., Kim, M., Drivenes, O., and McGinnis, W. (2009). Swaroop, A., Kim, D., and Forrest, D. (2010). Transcriptional regulation of Multiple transcription factor codes activate epidermal wound-response genes photoreceptor development and homeostasis in the mammalian retina. Nat. in Drosophila. Proc. Natl. Acad. Sci. USA 106, 2224–2229. Rev. Neurosci. 11, 563–576. Pennacchio, L.A., Ahituv, N., Moses, A.M., Prabhakar, S., Nobrega, M.A., Swiers, G., Chen, Y.H., Johnson, A.D., and Loose, M. (2010). A conserved Shoukry, M., Minovitsky, S., Dubchak, I., Holt, A., Lewis, K.D., et al. (2006). mechanism for vertebrate mesoderm specification in urodele amphibians In vivo enhancer analysis of human conserved non-coding sequences. Nature and mammals. Dev. Biol. 343, 138–152. 444 , 499–502. Szeto, D.P., Rodriguez-Esteban, C., Ryan, A.K., O’Connell, S.M., Liu, F., Peter, I.S., and Davidson, E.H. (2009). Modularity and design principles in the Kioussi, C., Gleiberman, A.S., Izpisua-Belmonte, J.C., and Rosenfeld, M.G. sea urchin embryo gene regulatory network. FEBS Lett. 583, 3948–3958. (1999). Role of the Bicoid-related homeodomain factor Pitx1 in specifying 13 Pimanda, J.E., Ottersbach, K., Knezevic, K., Kinston, S., Chan, W.Y., Wilson, hindlimb morphogenesis and pituitary development. Genes Dev. , 484–494. N.K., Landry, J.R., Wood, A.D., Kolb-Kokocinski, A., Green, A.R., et al. (2007). Tarchini, B., and Duboule, D. (2006). Control of Hoxd genes’ collinearity during Gata2, Fli1, and Scl form a recursively wired gene-regulatory circuit during early limb development. Dev. Cell 10, 93–103. early hematopoietic development. Proc. Natl. Acad. Sci. USA 104, 17692– Tessmar-Raible, K., Raible, F., Christodoulou, F., Guy, K., Rembold, M., 17697. Hausen, H., and Arendt, D. (2007). Conserved sensory-neurosecretory cell Protas, M.E., Hersey, C., Kochanek, D., Zhou, Y., Wilkens, H., Jeffery, W.R., types in annelid and fish forebrain: insights into hypothalamus evolution. Cell Zon, L.I., Borowsky, R., and Tabin, C.J. (2006). Genetic analysis of cavefish 129, 1389–1400. reveals molecular convergence in the evolution of albinism. Nat. Genet. 38, Ting, S.B., Caddy, J., Hislop, N., Wilanowski, T., Auden, A., Zhao, L.L., Ellis, S., 107–111. Kaur, P., Uchida, Y., Holleran, W.M., et al. (2005). A homolog of Drosophila Prud’homme, B., Gompel, N., Rokas, A., Kassner, V.A., Williams, T.M., Yeh, grainy head is essential for epidermal integrity in mice. Science 308, 411–413. S.D., True, J.R., and Carroll, S.B. (2006). Repeated morphological evolution Tumpel, S., Cambronero, F., Ferretti, E., Blasi, F., Wiedemann, L.M., and through cis-regulatory changes in a pleiotropic gene. Nature 440, 1050–1053. Krumlauf, R. (2007). Expression of Hoxa2 in rhombomere 4 is regulated by Ranade, S.S., Yang-Zhou, D., Kong, S.W., McDonald, E.C., Cook, T.A., and a conserved cross-regulatory mechanism dependent upon Hoxb1. Dev. Pignoni, F. (2008). Analysis of the Otd-dependent transcriptome supports Biol. 302, 646–660. the evolutionary conservation of CRX/OTX/OTD functions in flies and verte- Tumpel, S., Wiedemann, L.M., and Krumlauf, R. (2009). Hox genes and brates. Dev. Biol. 315, 521–534. segmentation of the vertebrate hindbrain. Curr. Top. Dev. Biol. 88, 103–137. Rastegar, S., Hess, I., Dickmeis, T., Nicod, J.C., Ertzer, R., Hadzhiev, Y., Thies, Vavouri, T., Walter, K., Gilks, W.R., Lehner, B., and Elgar, G. (2007). Parallel W.G., Scherer, G., and Strahle, U. (2008). The words of the regulatory code are evolution of conserved non-coding elements that target a common set of arranged in a variable manner in highly conserved enhancers. Dev. Biol. 318, developmental regulatory genes from worms to humans. Genome Biol. 8, R15. 366–377. Vinagre, T., Moncaut, N., Carapuco, M., Novoa, A., Bom, J., and Mallo, M. Rebeiz, M., Pool, J.E., Kassner, V.A., Aquadro, C.F., and Carroll, S.B. (2009). (2010). Evidence for a myotomal Hox/Myf cascade governing nonautonomous Stepwise modification of a modular enhancer underlies adaptation in control of rib specification within global vertebral domains. Dev. Cell 18, a Drosophila population. Science 326, 1663–1667. 655–661. Rokas, A., and Carroll, S.B. (2006). Bushes in the tree of life. PLoS Biol. 4, e352. Walters, J., Binkley, E., Haygood, R., and Romano, L.A. (2008). Evolutionary Ruvkun, G., Wightman, B., Burglin, T., and Arasu, P. (1991). Dominant gain-of- analysis of the cis-regulatory region of the spicule matrix gene SM50 in strong- 315 function mutations that lead to misregulation of the C. elegans heterochronic ylocentrotid sea urchins. Dev. Biol. , 567–578. gene lin-14, and the evolutionary implications of dominant mutations in Wang, J., Lee, A.P., Kodzius, R., Brenner, S., and Venkatesh, B. (2009). Large pattern-formation genes. Dev. Suppl. 1, 47–54. number of ultraconserved elements were already present in the jawed verte- 26 Seibert, J., and Urbach, R. (2010). Role of en and novel interactions between brate ancestor. Mol. Biol. Evol. , 487–490. msh, ind, and vnd in dorsoventral patterning of the Drosophila brain and ventral Weatherbee, S.D., Halder, G., Kim, J., Hudson, A., and Carroll, S. (1998). Ultra- nerve cord. Dev. Biol. 346, 332–345. bithorax regulates genes at several levels of the wing-patterning hierarchy to 12 Seipel, K., and Schmid, V. (2005). Evolution of striated muscle: jellyfish and the shape the development of the Drosophila haltere. Genes Dev. , 1474–1482. origin of triploblasty. Dev. Biol. 282, 14–26. Wellik, D.M. (2009). Hox genes and vertebrate axial pattern. Curr. Top. Dev. Biol. 88, 257–278. Shapiro, M.D., Bell, M.A., and Kingsley, D.M. (2006). Parallel genetic origins of pelvic reduction in vertebrates. Proc. Natl. Acad. Sci. USA 103, 13753–13758. Wellik, D.M., and Capecchi, M.R. (2003). Hox10 and Hox11 genes are required to globally pattern the mammalian skeleton. Science 301, 363–367. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al. (2005). Evolution- Woltering, J.M., and Duboule, D. (2010). The origin of digits: Expression arily conserved elements in vertebrate, insect, worm, and yeast genomes. patterns versus regulatory mechanisms. Dev. Cell 18, 526–532. Genome Res. 15, 1034–1050. Woltering, J.M., Vonk, F.J., Muller, H., Bardine, N., Tuduce, I.L., de Bakker, Simon, J., Peifer, M., Bender, W., and O’Connor, M. (1990). Regulatory M.A., Knochel, W., Sirbu, I.O., Durston, A.J., and Richardson, M.K. (2009). elements of the bithorax complex that control expression along the anterior- Axial patterning in snakes and caecilians: evidence for an alternative interpre- posterior axis. EMBO J. 9, 3945–3956. tation of the Hox code. Dev. Biol. 332, 82–89. Spitz, F., Gonzalez, F., Peichel, C., Vogt, T.F., Duboule, D., and Zakany, J. Yamamoto, Y., Stock, D.W., and Jeffery, W.R. (2004). Hedgehog signalling 431 (2001). Large scale transgenic and cluster deletion analysis of the HoxD controls eye degeneration in blind cavefish. Nature , 844–847. complex separate an ancestral regulatory module from evolutionary innova- Yang, X., Zarinkamar, N., Bao, R., and Friedrich, M. (2009). Probing the tions. Genes Dev. 15, 2209–2214. Drosophila retinal determination gene network in Tribolium (I): The early retinal 333 Spitz, F., Gonzalez, F., and Duboule, D. (2003). A global control region defines genes dachshund, eyes absent and sine oculis. Dev. Biol. , 202–214. a chromosomal regulatory landscape containing the HoxD cluster. Cell 113, Zakany, J., Kmita, M., and Duboule, D. (2004). A dual role for Hox genes in limb 405–417. anterior-posterior asymmetry. Science 304, 1669–1672.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 985 Leading Edge Review

Interactome Networks and Human Disease

Marc Vidal,1,2,* Michael E. Cusick,1,2 and Albert-La´ szlo´ Baraba´ si1,3,4,* 1Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA 2Department of Genetics, Harvard Medical School, Boston, MA 02115, USA 3Center for Complex Network Research (CCNR) and Departments of Physics, Biology and Computer Science, Northeastern University, Boston, MA 02115, USA 4Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA *Correspondence: [email protected] (M.V.), [email protected] (A.-L.B.) DOI 10.1016/j.cell.2011.02.016

Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease.

Introduction phenotypic associations, there would still be major problems Since the advent of molecular biology, considerable progress to fully understand and model human genetic variations and their has been made in the quest to understand the mechanisms impact on diseases. that underlie human disease, particularly for genetically inherited To understand why, consider the ‘‘one-gene/one-enzyme/ disorders. Genotype-phenotype relationships, as summarized in one-function’’ concept originally framed by Beadle and Tatum the Online Mendelian Inheritance in Man (OMIM) database (Am- (Beadle and Tatum, 1941), which holds that simple, linear berger et al., 2009), include mutations in more than 3000 human connections are expected between the genotype of an organism genes known to be associated with one or more of over 2000 and its phenotype. But the reality is that most genotype-pheno- human disorders. This is a truly astounding number of genotype relationships arise from a much higher underlying com- type-phenotype relationships considering that a mere three plexity. Combinations of identical genotypes and nearly identical decades have passed since the initial description of Restriction environments do not always give rise to identical phenotypes. Fragment Length Polymorphisms (RFLPs) as molecular markers The very coining of the words ‘‘genotype’’ and ‘‘phenotype’’ by to map genetic loci of interest (Botstein et al., 1980), only Johannsen more than a century ago derived from observations two decades since the announcement of the first positional that inbred isogenic lines of bean plants grown in well-controlled cloning experiments of disease-associated genes using RFLPs environments give rise to pods of different size (Johannsen, (Amberger et al., 2009), and just one decade since the release 1909). Identical twins, although strikingly similar, nevertheless of the first reference sequences of the human genome (Lander often exhibit many differences (Raser and O’Shea, 2005). Like- et al., 2001; Venter et al., 2001). For complex traits, the informa- wise, genotypically indistinguishable bacterial or yeast cells tion gathered by recent genome-wide association studies grown side by side can express different subsets of transcripts suggests high-confidence genotype-phenotype associations and gene products at any given moment (Elowitz et al., 2002; between close to 1000 genomic loci and one or more of over Blake et al., 2003; Taniguchi et al., 2010). Even straightforward one hundred diseases, including diabetes, obesity, Crohn’s Mendelian traits are not immune to complex genotype-pheno- disease, and hypertension (Altshuler et al., 2008). The discovery type relationships. Incomplete penetrance, variable expressivity, of genomic variations involved in cancer, inherited in the germ- differences in age of onset, and modifier mutations are more line or acquired somatically, is equally striking, with hundreds frequent than generally appreciated (Perlis et al., 2010). of human genes found linked to cancer (Stratton et al., 2009). We, along with others, argue that the way beyond these chal- In light of new powerful technological developments such as lenges is to decipher the properties of biological systems, and in next-generation sequencing, it is easily imaginable that a catalog particular, those of molecular networks taking place within cells. of nearly all human genomic variations, whether deleterious, As is becoming increasingly clear, biological systems and advantageous, or neutral, will be available within our lifetime. cellular networks are governed by specific laws and principles, Despite the natural excitement emerging from such a huge the understanding of which will be essential for a deeper com- body of information, daunting challenges remain. Practically, prehension of biology (Nurse, 2003; Vidal, 2009). the genomic revolution has, thus far, seldom translated directly Accordingly, our goal is to review key aspects of how complex into the development of new therapeutic strategies, and the systems operate inside cells. Particularly, we will review how by mechanisms underlying genotype-phenotype relationships interacting with each other, genes and their products form remain only partially explained. Assuming that, with time, most complex networks within cells. Empirically determining and human genotypic variations will be described together with modeling cellular networks for a few model organisms and for

986 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Mapping Interactome Networks Network science deals with complexity by ‘‘simplifying’’ complex systems, summarizing them merely as components (nodes) and interactions (edges) between them. In this simplified approach, the functional richness of each node is lost. Despite or even perhaps because of such simplifications, useful discoveries can be made. As regards cellular systems, the nodes are metabolites and macromolecules such as proteins, RNA molecules and gene sequences, while the edges are physical, biochemical and functional interactions that can be identified with a plethora of technologies. One challenge of network biology is to provide maps of such interactions using systematic and standardized approaches and assays that are as unbiased as possible. The resulting ‘‘interactome’’ networks, the networks of interactions between cellular components, can serve as scaffold information to extract global or local graph theory properties. Once shown to be statistically different from randomized Figure 1. Perturbations in Biological Systems and Cellular Networks networks, such properties can then be related back to a better May Underlie Genotype-Phenotype Relationships understanding of biological processes. Potentially powerful By interacting with each other, genes and their products form complex cellular details of each interaction in the network are left aside, including networks. The link between perturbations in network and systems properties and phenotypes, such as Mendelian disorders, complex traits, and cancer, functional, dynamic and logical features, as well as biochemical might be as important as that between genotypes and phenotypes. and structural aspects such as protein post-translational modifications or allosteric changes. The power of the approach resides precisely in such simplification of molecular detail, which allows human has provided a necessary scaffold toward understanding modeling at the scale of whole cells. the functional, logical and dynamical aspects of cellular systems. Early attempts at experimental proteome-scale interactome Importantly, we will discuss the possibility that phenotypes result network mapping in the mid-1990s (Finley and Brent, 1994; Bar- from perturbations of the properties of cellular systems and tel et al., 1996; Fromont-Racine et al., 1997; Vidal, 1997) were networks. The link between network properties and phenotypes, inspired by several conceptual advances in biology. The bio- including susceptibility to human disease, appears to be at least chemistry of metabolic pathways had already given rise to as important as that between genotypes and phenotypes cellular scale representations of metabolic networks. The dis- (Figure 1). covery of signaling pathways and cross-talk between them, as well as large molecular complexes such as RNA polymerases, Cells as Interactome Networks all involving innumerable physical protein-protein interactions, Systems biology can be said to have originated more than half suggested the existence of highly connected webs of interac- a century ago, when a few pioneers initially formulated a theoret- tions. Finally, the rapidly growing identification of many individual ical framework according to which multiscale dynamic complex interactions between transcription factors and specific DNA systems formed by interacting macromolecules could underlie regulatory sequences involved in the regulation of gene expres- cellular behavior (Vidal, 2009). These theoretical systems biology sion raised the question of how transcriptional regulation is ideas were elaborated upon at a time when there was little globally organized within cells. knowledge of the exact nature of the molecular components of Three distinct approaches have been used since to capture biology, let alone any detailed information on functional and interactome networks: (1) compilation or curation of already biophysical interactions between them. While greatly inspira- existing data available in the literature, usually obtained from tional to a few specialists, systems concepts remained largely one or just a few types of physical or biochemical interactions ignored by most molecular biologists, at least until empirical (Roberts, 2006); (2) computational predictions based on avail- observations could be gathered to validate them. Meanwhile, able ‘‘orthogonal’’ information apart from physical or biochem- theoretical representations of cellular organization evolved ical interactions, such as sequence similarities, gene-order steadily, closely following the development of ever improving conservation, copresence and coabsence of genes in com- molecular technologies. The organizational view of the cell pletely sequenced genomes and protein structural information changed from being merely a ‘‘bag of enzymes’’ to a web of (Marcotte and Date, 2001); and (3) systematic, unbiased high- highly interrelated and interconnected organelles (Robinson throughput experimental mapping strategies applied at the scale et al., 2007). Cells can accordingly be envisioned as complex of whole genomes or proteomes (Walhout and Vidal, 2001). webs of macromolecular interactions, the full complement of These approaches, though complementary, differ greatly in the which constitutes the ‘‘interactome’’ network. At the dawn of possible interpretations of the resulting maps. Literature-curated the 21st century, with most components of cellular networks maps present the advantage of using already available informa- having been identified, the basic ideas of systems and network tion, but are limited by the inherently variable quality of the biology are ready to be experimentally tested and applied to published data, the lack of systematization, and the absence relevant biological problems. of reporting of negative data (Cusick et al., 2009; Turinsky

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 987 Figure 2. Networks in Cellular Systems To date, cellular networks are most available for the ‘‘super-model’’ organisms (Davis, 2004) yeast, worm, fly, and plant. High-throughput interactome mapping relies upon genome-scale resources such as ORFeome resources. Several types of interactome networks discussed are depicted. In a protein interaction network, nodes represent proteins and edges represent physical interactions. In a transcriptional regulatory network, nodes represent transcription factors (circular nodes) or putative DNA regulatory elements (diamond nodes); and edges represent physical binding between the two. In a disease network, nodes represent diseases, and edges represent gene mutations of which are associated with the linked diseases. In a virus-host network, nodes represent viral proteins (square nodes) or host proteins (round nodes), and edges represent physical interactions between the two. In a metabolic network, nodes represent enzymes, and edges represent metabolites that are products or substrates of the enzymes. The network depictions seem dense, but they represent only small portions of available interactome network maps, which themselves constitute only a few percent of the complete interactomes within cells. et al., 2010). Computational prediction maps are fast and effi- were discovered in and are being applied genome-wide for these cient to implement, and usually include satisfyingly large model organisms (Mohr et al., 2010). numbers of nodes and edges, but are necessarily imperfect because they use indirect information (Plewczynski and Ginalski, Metabolic Networks 2009). While high-throughput maps attempt to describe unbi- Metabolic network maps attempt to comprehensively describe ased, systematic, and well-controlled data, they were initially all possible biochemical reactions for a particular cell or more difficult to establish, although recent technological organism (Schuster et al., 2000; Edwards et al., 2001). In many advances suggest that near completion can be reached within representations of metabolic networks, nodes are biochemical a few years for highly reliable, comprehensive protein-protein metabolites and edges are either the reactions that convert interaction and gene regulatory network maps for human (Ven- one metabolite into another or the enzymes that catalyze these katesan et al., 2009). reactions (Jeong et al., 2000; Schuster et al., 2000)(Figure 2). The mapping and analysis of interactome networks for Edges can be directed or undirected, depending on whether model organisms was instrumental in getting to this point. a given reaction is reversible or not. In specific cases of meta- Such efforts provided, and will continue to provide, both neces- bolic network modeling, the converse situation can be used, sary pioneering technologies and crucial conceptual insights. As with nodes representing enzymes and edges pointing to adja- with other aspects of biology, advancements in mapping of inter- cent pairs of enzymes for which the product of one is the actome networks would have been minimal without a focus on substrate of the other (Lee et al., 2008). model organisms (Davis, 2004). The field of interactome Although large metabolic pathway charts have existed for mapping has been helped by developments in several model decades (Kanehisa et al., 2008), nearly complete metabolic organisms, primarily the yeast, Saccharomyces cerevisiae, the network maps required the completion of full genome fly, Drosophila melanogaster, and the worm, Caenorhabditis sequencing together with accurate gene annotation tools (Ober- elegans (Figure 2). For instance, genome-wide resources such hardt et al., 2009). Network construction is manual with compu- as collections of all, or nearly all, open reading frames tational assistance, involving: (1) the meticulous curation of large (ORFeomes) were first generated for these model organisms, numbers of publications, each describing experimental results both because their genomes are the best annotated and regarding one or several metabolic reactions characterized because there are fewer complications, such as the high number from purified or reconstituted enzymes, and (2) when necessary, of splice variants in human and other mammals. ORFeome the compilation of predicted reactions from studies of ortholo- resources allow efficient transfer of large numbers of ORFs into gous enzymes experimentally characterized in other species. vectors suitable for diverse interactome mapping technologies Assembly of the union of all experimentally demonstrated (Hartley et al., 2000; Walhout et al., 2000b). Moreover, gene abla- and predicted reactions gives rise to proteome-scale network tion technologies, knockouts (for yeast) and knockdowns by maps (Mo and Palsson, 2009). Such maps have been RNAi (for worms and flies) and transposon insertions (for plants), compiled for numerous species, predominantly prokaryotes

988 Cell 144, March 18, 2011 ª2011 Elsevier Inc. and unicellular eukaryotes (Oberhardt et al., 2009), and full-scale now allows the estimation of overall accuracy and sensitivity for metabolic reconstructions are now underway for human as maps obtained using high-throughput mapping approaches. well (Ma et al., 2007). Metabolic network maps are likely the Four critical parameters need to be estimated: (1) complete- most comprehensive of all biological networks, although consid- ness: the number of physical protein pairs actually tested in a erable gaps will remain to be filled in by direct experimental given search space; (2) assay sensitivity: which interactions investigations. can and cannot be detected by a particular assay; (3) sampling sensitivity: the fraction of all detectable interactions found by a Protein-Protein Interaction Networks single implementation of any interaction assay; and (4) precision: In protein-protein interaction network maps, nodes represent the proportion of true biophysical interactors. Careful consider- proteins and edges represent a physical interaction between ation of these parameters offers a quantitative idea of the two proteins. The edges are nondirected, as it cannot be said completeness and accuracy of a particular high-throughput which protein binds the other, that is, which partner functionally interaction map (Yu et al., 2008; Simonis et al., 2009; Venkatesan influences the other (Figure 2). Of the many methodologies that et al., 2009), and allows comparison of multiple maps as long as can map protein-protein interactions, two are currently in wide standardized framework parameters are used. In contrast, use for large-scale mapping. Mapping of binary interactions is comparing the results of small-scale experiments available in primarily carried out by ever improving variations of the yeast literature curated databases is not possible, as there is little two-hybrid (Y2H) system (Fields and Song, 1989; Dreze et al., way to control for accuracy, reproducibility, and sensitivity. 2010). Mapping of membership in protein complexes, providing The binary interactome empirical framework offers a way to esti- indirect associations between proteins, is carried out by affinity- mate the size of interactome networks, which in turn is essential or immunopurification to isolate protein complexes, followed by to define a roadmap to reach completion for the interactome some form of mass spectrometry (AP/MS) to identify protein mapping efforts of any species of interest. While originally estab- constituents of these complexes (Rigaut et al., 1999; Charbon- lished for protein-protein interaction mapping, similar empirical nier et al., 2008). While Y2H datasets contain mostly direct binary frameworks can be applied more broadly to mapping of other interactions, AP/MS cocomplex data sets are composed of types of interactome networks (Costanzo et al., 2010). direct interactions mixed with a preponderance of indirect associations. Accordingly, the graphs generated by these two Gene Regulatory Networks approaches exhibit different global properties (Seebacher and In most gene regulatory network maps, nodes are either a tran- Gavin, 2011), such as the relationships between gene essenti- scription factor or a putative DNA regulatory element, and ality and the number of interacting proteins (Yu et al., 2008). directed edges represent the physical binding of transcription In the past decade, significant steps have been taken toward factors to such regulatory elements (Zhu et al., 2009). Edges the generation of comprehensive protein-protein interaction can be said to be incoming (transcription factor binds a regulatory network maps. Comprehensive efforts using Y2H technologies DNA element) or outgoing (regulatory DNA element bound by to generate interactome maps began with the model organisms a transcription factor) (Figure 2). Currently, two general S. cerevisiae, C. elegans, and D. melanogaster (Ito et al., 2000, approaches are amenable to large-scale mapping of gene regu- 2001; Uetz et al., 2000; Walhout et al., 2000a; Giot et al., 2003; latory networks. In yeast one-hybrid (Y1H) approaches, a putative Reboul et al., 2003; Li et al., 2004), and eventually included cis-regulatory DNA sequence, commonly a suspected promoter human (Colland et al., 2004; Rual et al., 2005; Stelzl et al., region, is used as bait to capture transcription factors that bind to 2005; Venkatesan et al., 2009). Comprehensive mapping of co- that sequence (Deplancke et al., 2004). In chromatin immunopre- complex membership by high-throughput AP/MS was initially cipitation (ChIP) approaches, antibodies raised against transcrip- undertaken in yeast (Gavin et al., 2002; Ho et al., 2002), rapidly tion factors of interest, or against a peptide tag used in fusion with progressing to ever improving completeness and quality there- potential transcription factors, are used to immunoprecipitate after (Gavin et al., 2006; Krogan et al., 2006). For technical potentially interacting cross-linked DNA fragments (Lee et al., reasons future comprehensive AP/MS efforts will stay focused 2002). As Y1H proceeds from genes and captures associated on unicellular organisms such as yeast (Collins et al., 2007) proteins it is said to be ‘‘gene-centric,’’ whereas ChIP strategies and mycoplasma (Kuhner et al., 2009), whereas Y2H efforts are ‘‘protein-centric’’ in that they proceed from transcription are more readily implemented for complex multicellular organ- factors and attempt to capture associated gene regions (Walh- isms (Seebacher and Gavin, 2011). out, 2006). The two approaches are complementary. The Y1H In their early implementations, systematic and comprehensive system can discover novel transcription factors but relies on interaction network mapping efforts met with skepticism having known, or at least suspected, regulatory regions; ChIP regarding their accuracy (von Mering et al., 2002), analogous methods can discover novel regulatory motifs but rely on the to the original concerns over whether automated high- availability of reagents specific to transcription factors of interest, throughput genome sequencing efforts might have considerably which themselves depend on accurate predictions of transcrip- lower accuracy than dedicated efforts carried out cumulatively tion factors (Reece-Hoyes et al., 2005; Vaquerizas et al., 2009). in many laboratories. Only after the emergence of rigorous Large-scale Y1H networks have been produced for C. elegans statistical tests to estimate sequencing accuracy could high- (Vermeirssen et al., 2007; Grove et al., 2009). Large-scale ChIP- throughput sequencing efforts reach their full potential (Ewing based networks have been produced for yeast (Lee et al., 2002) et al., 1998). Analogously, an empirical framework recently prop- and have been carried out for mammalian tissue culture cells as agated for protein interaction mapping (Venkatesan et al., 2009) well (Cawley et al., 2004).

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 989 Figure 3. Integrated Networks Coexpression and phenotypic profiling can be thought of as matrices comprising all genes of an organism against all conditions that this organism has been exposed to within a given expression compendium and all phenotypes tested, respectively. For any correlation measurement, Pearson correlation coefficients (PCCs) being commonly used, the threshold between what is considered coexpressed and noncoexpressed needs to be set using appropriate titration procedures. Pairs of genes whose expression or phenotype profiles are above the determined threshold are then linked. The resulting integrated networks have powerful predictive value. Adapted from (Gunsalus et al., 2005).

based upon functional relationships or similarities, independently of physical macromolecular interactions. We con- In addition to transcription factor activities, overall gene tran- sider three types of functional networks that have been mapped script levels are also regulated post-transcriptionally by micro thus far at the scale of whole genomes and used together with RNAs (miRNAs), short noncoding RNAs that bind to complemen- physical interactome networks to interrogate the complexities tary cis-regulatory RNA sequences usually located in 30 untrans- of genotype-to-phenotype relationships. lated regions (UTRs) of target mRNAs (Lee et al., 2004; Ruvkun Transcriptional Profiling Networks et al., 2004). miRNAs are not expected to act as master regula- Gene products that function together in common signaling tors, but rather act post-transcriptionally to fine-tune gene cascades or protein complexes are expected to show greater expression by modulating the levels of target mRNAs. Complex similarities in their expression patterns than random sets of networks are formed by miRNAs interacting with their targets. In gene products. But how does this expectation translate at the such networks, nodes are either a miRNA or a target 30UTR, and level of whole proteomes and transcriptomes? How do tran- edges represent the complementary annealing of the miRNA to scriptome states correlate globally with interactome networks? the target RNA. Edges can be said to be incoming (miRNA binds Since the original description of microarray and DNA chip tech- a30UTR element) or outgoing (30UTR element bound by a miRNA) niques and more recently de novo RNA sequencing using next- (Martinez et al., 2008). The targets of miRNAs are generally generation sequencing technologies, vast compendiums of computationally predicted, as experimental methodologies to gene expression datasets have been generated for many map miRNA/30UTR interactions at high-throughput are just different species across a multitude of diverse genetic and envi- coming online (Karginov et al., 2007; Guo et al., 2010; Hafner ronmental conditions. This type of information can be thought of et al., 2010). Since transcription factors regulate the expression as matrices comprising all genes of an organism against all of miRNAs, it is however possible to combine Y1H methods with conditions that this organism has been exposed to within a given computationally predicted miRNA/30UTR interactions, a strategy expression compendium (Vidal, 2001). In the resulting coexpres- which was used to derive a large-scale miRNA network in sion networks, nodes represent genes, and edges link pairs of C. elegans (Martinez et al., 2008) and which could be extended genes that show correlated coexpression above a set threshold to other genomes. (Kim et al., 2001; Stuart et al., 2003). For any correlation measurement, Pearson correlation coefficients (PCCs) being Integrating Interactome Networks with other Cellular commonly used, the threshold between what is considered Networks coexpressed and not coexpressed needs to be set using The three interactome network types considered so far, meta- appropriate titration procedures (Stuart et al., 2003; Gunsalus bolic, protein-protein interaction, and gene regulatory networks, et al., 2005). are composed of physical or biochemical interactions between Integration attempts in yeast, combining physical protein- macromolecules. The corresponding network maps provide protein interaction maps with coexpression profiles, revealed crucial ‘‘scaffold’’ information about cellular systems, on top of that interacting proteins are more likely to be encoded by genes which additional layers of functional links can be added to fine- with similar expression profiles than noninteracting proteins (Ge tune the representation of biological reality (Figure 3)(Vidal, et al., 2001; Grigoriev, 2001; Jansen et al., 2002; Kemmeren 2001). Networks composed of functional links, although strik- et al., 2002). These observations were subsequently confirmed ingly different in terms of what the edges represent, can never- in many other organisms (Ge et al., 2003). Beyond the funda- theless complement what can be learned from interactome mental aspect of finding significant overlaps between interaction network maps in powerful ways, and vice versa. Networks of edges in interactome networks and coexpression edges in tran- functional links represent a category of cellular networks that scription profiling networks, these observations have been used can be derived from indirect, or ‘‘conceptual,’’ interactions to estimate the overall biological significance of interactome where links between genes and gene products are reported datasets. While correlations can be statistically significant over

990 Cell 144, March 18, 2011 ª2011 Elsevier Inc. huge datasets, still many valid biologically relevant protein- when the phenotype of double mutants is significantly better protein interactions correspond to pairs of genes whose expres- than that expected from the single mutants (Mani et al., 2008). sion is uncorrelated or even anticorrelated. Coexpression Though finding genetic interactions has been crucial to geneti- similarity links need not be perfectly overlapping with physical cists for decades (Sturtevant, 1956; Novick et al., 1989), only in interactions of the corresponding gene products and vice versa. the last ten years has functional genomics advanced sufficiently In another example of what coexpression networks can be to allow systematic high-throughput mapping of genetic interac- used for, preliminary steps have been taken to delineate gene tions to give rise to large-scale networks (Boone et al., 2007). regulatory networks from coexpression profiles (Segal et al., Two general strategies have been followed for the systematic 2003; Amit et al., 2009). Such network constructions provide mapping of genetic interactions in yeast. Synthetic genetic verifiable hypotheses about how regulatory pathways operate. arrays (SGA) and derivative methodologies use high-density Phenotypic Profiling Networks arrays of double mutants by mating pairs from an available set Perturbations of genes that encode functionally related products of single mutants (Tong et al., 2001; Boone et al., 2007). Alterna- often confer similar phenotypes. Systematic use of gene knock- tive strategies take advantage of sequence barcodes embedded out strategies developed for yeast (Giaever et al., 2002) and in a set of yeast deletion mutants (Giaever et al., 2002; Beltrao knock-down approaches using RNA interference (RNAi) for et al., 2010) to measure the relative growth rate in a population C. elegans, Drosophila, and, recently, human (Mohr et al., of double mutants by hybridization to anti-barcode microarrays 2010), are amenable to the perturbation of (nearly) all genes (Pan et al., 2004; Boone et al., 2007). These two approaches and subsequent testing of a wide variety of standardized pheno- seem to capture similar aspects of genetic interactions, as the types. As with transcriptional profiling networks, this type of overlap between the two types of datasets is significant (Cos- information can be thought of as matrices comprising all genes tanzo et al., 2010). of an organism and all phenotypes tested within a given pheno- Patterns of genetic interactions can be used to define a kind of typic profiling compendium. In the resulting phenotypic similarity network that is similar to phenotypic profiling or phenome or ‘‘phenome’’ network, nodes represent genes, and edges link networks. As with transcriptional and phenotypic profiling net- pairs of genes that show correlated phenotypic profiles above works, this type of information can be thought of as matrices a set threshold. Here again, titration is needed to decide on the comprising all genes of an organism and the genes with which threshold between what is considered phenotypically similar they exhibit a genetic interaction. In such ‘‘genetic interaction and what is not (Gunsalus et al., 2005). profiling’’ networks, edges functionally link two genes based The earliest evidence that phenotypic profiling or ‘‘phenome’’ on high similarities of genetic interaction profiles. Here again, networks might help in interpreting protein-protein interactome predictive models of biological processes can be obtained networks was obtained in studies of the C. elegans DNA damage when such networks are combined with other types of interac- response and hermaphrodite germline (Boulton et al., 2002; tome networks. Piano et al., 2002; Walhout et al., 2002). The physical basis of Integration of genetic interaction networks with other types of phenome networks is not yet completely defined, though there interactome network maps provides potentially powerful are strong overlaps between correlated phenotypic profiles models. While genetic interactions do not necessarily corre- and physical protein-protein interactions (Walhout et al., 2002; spond to physical interactions between the corresponding Gunsalus et al., 2005). Overlapping three network types, binary gene products (Mani et al., 2008), interesting patterns emerge interactions, coexpression, and phenotype profiling, produces between the different datasets. Because they tend to reveal integrated networks with high predictive power, as demon- pairs of genes involved in parallel pathways or in different molec- strated for C. elegans early embryogenesis (Walhout et al., ular machines, negative genetic interactions tend not to correlate 2002; Gunsalus et al., 2005). Integration of transcriptional regu- with either protein associations in protein complexes or with latory networks with these other network types has also been binary protein-protein interactions (Beltrao et al., 2010; Costanzo undertaken in worm (Grove et al., 2009). et al., 2010). In contrast, positive genetic interactions tend to Comprehensive genome-wide phenome networks are now a point to pairs of gene products physically associated with each reality for the yeast S. cerevisiae (Giaever et al., 2002), and are other. This trend is usually explained by loss of either one or expected to be further developed for C. elegans (So¨ nnichsen two gene products working together in a molecular complex et al., 2005) and Drosophila (Mohr et al., 2010). Now that RNAi resulting in similar effects (Beltrao et al., 2010). reagents are available for nearly all genes of mouse and human (Root et al., 2006), phenome maps for cell lines of these organ- Graph Properties of Networks isms should soon follow. A critical realization over the past decade is that the structure Genetic Interaction Networks and evolution of networks appearing in natural, technological, Pairs of functionally related genes tend to exhibit genetic interac- and social systems over time follows a series of basic and repro- tions, defined by comparing the phenotype conferred by muta- ducible organizing principles. Theoretical advances in network tions in pairs of genes (double mutants) to the phenotype science (Albert and Barabasi, 2002), paralleling advances in conferred by either one of these mutations alone (single high-throughput efforts to map biological networks, have pro- mutants). Genetic interactions are classified as negative, i.e. vided a conceptual framework with which to interpret large aggravating, synthetic sick or lethal, when the phenotype of interactome network maps. Full understanding of the internal double mutants is significantly worse than expected from that organization of a cell requires awareness of the constraints of single mutants, and positive, i.e. alleviating or suppressive, and laws that biological networks follow. We summarize several

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 991 principles of network theory that have immediate applications to and C. elegans, hub proteins were found to: (1) correspond to biology. essential genes (Jeong et al., 2001), (2) be older and have Degree Distribution and Hubs evolved more slowly (Fraser et al., 2002), (3) have a tendency Any empirical investigation starts with the same question: could to be more abundant (Ivanic et al., 2009), and (4) have a larger the investigated phenomena have emerged by chance, or could diversity of phenotypic outcomes resulting from their deletion random effects account for them? The earliest network models compared to the deletion of less connected proteins (Yu et al., assumed that complex networks are wired randomly, such that 2008). While the evidence attributed to some of these findings any two nodes are connected by a link with the same probability has been debated (Jordan et al., 2003; Yu et al., 2008; Ivanic p. This Erdos-Renyi model generates a network with a Poisson et al., 2009), the special role of hub proteins in model organisms degree distribution, which implies that most nodes have approx- led to the expectation that, in humans, hubs should preferentially imately the same degree, that is, the same number of links, while encode disease-related genes. Indeed, upregulated genes in nodes that have significantly more or fewer links than any lung squamous cell carcinoma tissues tend to have a high average node are exceedingly rare or altogether absent. In degree in protein-protein interaction networks (Wachi et al., contrast, many real networks, from the world wide web to social 2005), and cancer-related proteins have, on average, twice as networks, are scale-free (Baraba´ si and Albert, 1999), which many interaction partners as noncancer proteins in protein- means that their degree distribution follows a power law rather protein interaction networks (Jonsson and Bates, 2006). A than the expected Poisson distribution. In a scale-free network cautionary note is necessary: since disease-related proteins most nodes have only a few interactions, and these coexist tend to be more avidly studied their higher connectivity may be with a few highly connected nodes, the hubs, that hold the whole partly rooted in investigative biases. Therefore, this type of network together. This scale-free property has been found in all finding needs to be appropriately controlled using systematic organisms for which protein-protein interaction and metabolic proteome-wide interactome network maps. network maps exist, from yeast to human (Baraba´ si and Oltvai, Understanding the role of hubs in human disease requires 2004; Seebacher and Gavin, 2011). Regulatory networks, distinguishing between essential genes and disease-related however, show a mixed behavior. The outgoing degree distribu- genes (Goh et al., 2007). Some human genes are essential for tion, corresponding to how many different genes a transcription early development, such that mutations in them often lead to factor can regulate, is scale-free, meaning that some master spontaneous abortions. The protein products of mouse in utero regulators can regulate hundreds of genes. In contrast, the essential genes show a strong tendency to be associated with incoming degree distribution, corresponding to how many tran- hubs and to be expressed in multiple tissues (Goh et al., 2007). scription factors regulate a specific gene, best fits an exponential Nonessential disease genes tend not to be encoded by hubs model (Deplancke et al., 2006), indicating that genes that are and tend to be tissue specific. These differences can be best simultaneously regulated by large numbers of transcription appreciated from an evolutionary perspective. Mutations that factors are exponentially rare. disrupt hubs may have difficulty propagating in the population, Gene Duplication as the Origin of the Scale-Free as the host may not survive long enough to have offspring. Property Only mutations that impair functionally and topologically periph- The scale-free topology of biological networks likely originates eral genes can persist, becoming responsible for heritable from gene duplication. While the principle applies from meta- diseases, particularly those that manifest in adulthood. bolic to regulatory networks, it is best illustrated in protein- Date and Party Hubs protein interaction networks, where it was first proposed Another success in uncovering the functional consequences of (Pastor-Satorras et al., 2003; Va´ zquez et al., 2003). When cells the topology of interactome networks was provided by the divide and the genome replicates, occasionally an extra copy discovery of date and party hubs (Han et al., 2004). Upon inte- of one or several genes or chromosomes gets produced. Imme- grating protein-protein interaction network data with transcrip- diately following a duplication event, both the original protein and tional profiling networks for yeast, at least two classes of hubs the new extra copy have the same structure, so both interact can be discriminated. Party hubs are highly coexpressed with with the same set of partners. Consequently, each of the protein their interacting partners while date hubs appear to be more partners that interacted with the ancestor gains a new interac- dynamically regulated relative to their partners (Han et al., tion. This process results in a ‘rich-get-richer’ phenomenon (Bar- 2004). In other words, date hubs interact with their partners at aba´ si and Albert, 1999), where proteins with a large number of different times and/or different conditions, whereas party hubs interactions tend to gain links more often, as it is more likely seem to interact with their partners at all times or conditions that they interact with a duplicated protein. This mechanism tested (Seebacher and Gavin, 2011). Despite the preponderance has been shown to generate hubs (Pastor-Satorras et al., of evidence in its favor, the date and party hubs concept remains 2003; Va´ zquez et al., 2003), and so could be responsible for a subject of debate, (Agarwal et al., 2010), attributable to the the scale-free property of protein-protein interaction networks. necessity to appropriately calibrate coexpression and protein- The Role of Hubs protein interaction hub thresholds when analyzing new transcrip- Network biology attempts to identify global properties in interactome and interactome datasets (Bertin et al., 2007). tome network graphs, and subsequently relate such properties Fundamentally, date hubs preferentially connect functional to biological reality by integrating various functional datasets. modules to each other, whereas party hubs preferentially act One of the best examples where this approach was successful is inside functional modules, hence they are occasionally called in defining the role of hubs. In the model organisms S. cerevisiae inter-module and intra-module hubs, respectively (Han et al.,

992 Cell 144, March 18, 2011 ª2011 Elsevier Inc. 2004; Taylor et al., 2009). Date hubs are less evolutionarily con- tify topological and functional clusterings continue to be strained than party hubs (Fraser, 2005; Ekman et al., 2006; Bertin described (Ahn et al., 2010). Such modules can serve as hypoth- et al., 2007). Party hubs contain fewer and shorter regions of esis building tools to identify regions of the interactome likely intrinsic disorder than do date hubs (Ekman et al., 2006; Singh involved in particular cellular functions or disease (Baraba´ si et al., 2006; Kahali et al., 2009) and contain fewer linear motifs et al., 2010). (short binding motifs and post-translational modification sites) than do date hubs (Taylor et al., 2009). Initially explored in a yeast Networks and Human Diseases interactome (Han et al., 2004; Ekman et al., 2006), the distinction Having reviewed why biological networks are important to between date and party hubs can be recapitulated in human consider, how they can be mapped and integrated with each interactomes as well (Taylor et al., 2009). other, and what global properties are starting to emerge from Motifs such models, we next return to our original question: to what There has been considerable attention paid in recent years to extent do biological systems and cellular networks underlie network motifs, which are characteristic network patterns, or genotype-phenotype relationships in human disease? We subgraphs, in biological networks that appear more frequently attempt to provide answers by covering four recent advances than expected given the degree distribution of the network in network biology: (1) studies of global relationships between (Milo et al., 2002). Such subgraphs have been found to be asso- human disorders, associated genes and interactome networks, ciated with desirable (or undesirable) biological function (or (2) predictions of new human disease-associated genes using in- dysfunction). Hence identification and classification of motifs teractome models, (3) analyses of network perturbations by can offer information about the various network subgraphs pathogens, and (4) emergence of node removal versus edge- needed for biological processes. It is now commonly understood specific or ‘‘edgetic’’ models to explain genotype-phenotype that motifs constitute the basic building blocks of cellular relationships. networks (Milo et al., 2002; Yeger-Lotem et al., 2004). Global Disease Networks Originally identified in transcriptional regulatory networks of One of the main predictions derived from the hypothesis that several model organisms (Milo et al., 2002; Shen-Orr et al., human disorders should be viewed as perturbations of highly 2002), motifs have been subsequently identified in interactome interlinked cellular networks is that diseases should not be inde- networks and in integrated composite networks (Yeger-Lotem pendent from each other, but should instead be themselves et al., 2004; Zhang et al., 2005). Different types of networks highly interconnected. Such potential cellular network-based exhibit different motif profiles, suggesting a means for network dependencies between human diseases has led to the genera- classification (Milo et al., 2004; Zhang et al., 2005). The high tion of various global disease network maps, which link disease degree of evolutionary conservation of motif constituents within phenotypes together if some molecular or phenotypic relation- interaction networks (Wuchty et al., 2003), combined with the ships exist between them. Such a map was built using known convergent evolution that is seen in the transcription regulatory gene-disease associations collected in the OMIM database networks of diverse species toward the same motif types (Bara- (Goh et al., 2007), where nodes are diseases and two diseases ba´ si and Oltvai, 2004), makes a strong argument that motifs are are linked by an edge if they share at least one common gene, of direct biological relevance. Classification of several highly mutations in which are associated with these diseases. In the significant motifs of two, three, and four nodes, with descriptors obtained disease network more than 500 human genetic disor- like coherent feed forward loop or single-input module, has ders belong to a single interconnected main giant component, shown that specific types of motifs carry out specific dynamic consistent with the idea that human diseases are much more functions within cells (Alon, 2007; Shoval and Alon, 2010). connected to each other than anticipated. The flipside of this Topological, Functional, and Disease Modules representation of connectivity is a network of disease-associ- Most biological networks have a rather uneven organization. ated genes linked together if mutations in these genes are known Many nodes are part of locally dense neighborhoods, or topolog- to be responsible for at least one common disorder. Providing ical modules, where nodes have a higher tendency to link to support for our general hypothesis that perturbations in cellular nodes within the same local neighborhood than to nodes outside networks underlie genotype-phenotype relationships, such of it (Ravasz et al., 2002). A region of the global network diagram disease-associated gene networks overlap significantly with that corresponds to a potential topological module can be iden- human protein-protein interactome network maps (Goh et al., tified by network clustering algorithms which are blind to the 2007). function of individual nodes. These topological modules are Additional types of connectivity between large numbers of often believed to carry specific cellular functions, hence leading human diseases can be found in ‘‘comorbidity’’ networks where to the concept of a functional module, an aggregation of nodes of diseases are linked to each other when individuals who were similar or related function in the same network neighborhood. diagnosed for one particular disease are more likely to have Interest is increasing in disease modules, which represent also been diagnosed for the other (Rzhetsky et al., 2007; Hidalgo groups of network components whose disruption results in et al., 2009). Diabetes and obesity represent probably the best a particular disease phenotype in humans (Baraba´ si et al., 2010). known disease pair with such significant comorbidity. While co- There is a tacit assumption, based on evidence in the biolog- morbidity can have multiple origins, ranging from environmental ical literature, that cellular components forming topological factors to treatment side effects, its potential molecular origin modules have closely related functions, thus corresponding to has attracted considerable attention. A network biology interpre- functional modules. New potentially powerful methods to iden- tation would suggest that the molecular defects responsible for

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 993 one of a pair of diseases can ‘‘spread along’’ the edges in cellular toma and other types of cancers (DeCaprio, 2009). This hypoth- networks, affecting the activity of related gene products and esis will soon be tested globally by systematic investigations of causing or affecting the outcome of the other disease (Park how host networks, including physical interaction, gene regula- et al., 2009). tory and genetic interaction networks, are perturbed upon viral Predicting Disease Related Genes by Using Interactome infection. Pathogen-host interaction mapping projects are also Networks in their first iterations, with similar goals of identifying emergent If cellular networks underlie genotype-phenotype relationships, global properties and disease surrogates. As microbial patho- then network properties should be predictive of novel, yet to be gens can have thousands of gene products relative to much identified human disease-associated genes. In an early example, smaller numbers for most viruses, such projects will require it was shown that the products of a few dozen ataxia-associated considerably more effort and time. genes occupy particular locations in the human interactome Edgetics network, in that the number of edges separating them is on Our underlying premise throughout has been that phenotypic average much lower than for random sets of gene products variations of an organism, particularly those that result in human (Lim et al., 2006). Physical protein-protein interactome network disease, arise from perturbations of cellular interactome maps can indeed generate lists of genes potentially enriched for networks. These alterations range from the complete loss of new candidate disease genes or modifier genes of known disease a gene product, through the loss of some but not all interactions, genes (Lim et al., 2006; Oti et al., 2006; Fraser and Plotkin, 2007). to the specific perturbation of a single molecular interaction while Integration of various interactome and functional relationship retaining all others. In interactome networks these alterations networks have also been applied to reveal genes potentially range from node removal at one end and edge-specific, involved in cancer (Pujana et al., 2007). Integrating a coexpres- ‘‘edgetic’’ perturbations at the other (Zhong et al., 2009). The sion network, seeded with four well-known breast cancer consequences on network structure and function are expected associated genes, together with genetic and physical interac- to be radically dissimilar for node removal versus edgetic pertur- tions, yielded a breast cancer network model out of which candi- bation. Node removal not only disables the function of a node but date cancer susceptibility and modifier genes could be predicted also disables all the interactions of that node with other nodes, (Pujana et al., 2007). Integrative network modeling strategies are disrupting in some way the function of all of the neighboring no- applicable to other types of cancer and other types of disease des. An edgetic disruption, removing one or a few interactions (Ergun et al., 2007; Wu et al., 2008; Lee et al., 2010). but leaving the rest intact and functioning, has subtler effects Network Perturbations by Pathogens on the network, though not necessarily on the resulting pheno- Pathogens, particularly viruses, have evolved sophisticated type (Madhani et al., 1997). The distinction between node mechanisms to perturb the intracellular networks of their hosts removal and edgetic perturbation models can provide new clues to their advantage. As obligate intracellular pathogens, viruses on mechanisms underlying human disease, such as the different must intimately rewire cellular pathways to their own ends to main- classes of mutations that lead to dominant versus recessive tain infectivity. Since many virus-host interactions happen at the modes of inheritance (Zhong et al., 2009). level of physical protein-protein interactions, systematic maps The idea that the disruption of specific protein interactions capturing viral-host physical protein-protein interactions, or can lead to human disease (Schuster-Bockler and Bateman, ‘‘virhostome’’ maps, have been obtained using Y2H for Epstein- 2008) complements canonical gene loss/perturbation models Barr virus (Calderwood et al., 2007), hepatitis C virus (de Chassey (Botstein and Risch, 2003), and is poised to explain confounding et al., 2008), several herpesviruses (Uetz et al., 2006), influenza genetic phenomena such as genetic heterogeneity. virus (Shapira et al., 2009) and others (Mendez-Rios and Uetz, Matching the edgetic hypothesis to inherited human diseases, 2010), and by co-AP/MS methodologies for HIV (Ja¨ ger et al., approximately half of 50,000 Mendelian alleles available in the 2010). An eminent goal is to find perturbations in network proper- human gene mutation database can be modeled as potentially ties of the host network, properties that would not be made edgetic if one considers deletions and truncating mutations as evident by small-scale investigations focused on one or a handful node removal, and in-frame point mutations leading to single of viral proteins. For instance, it has been found several times amino-acid changes and small insertions and deletions as now that viral proteins preferentially target hubs in host edgetic perturbations (Zhong et al., 2009). This number is prob- interactome networks (Calderwood et al., 2007; Shapira et al., ably a good approximation, since thus far disease-associated 2009). The many host targets identified in virhostome screens genes predicted to bear edgetic alleles using this model have are now getting biologically validated by RNAi knock-down and been experimentally confirmed (Zhong et al., 2009). For genes transcriptional profiling, leading to detailed maps of the interac- associated with multiple disorders and for which predicted tions underlying viral-host relationships (Shapira et al., 2009). protein interaction domains are available, it was shown that Another impetus for mapping virhostome networks is that putative edgetic alleles responsible for different disorders tend virus protein interactions can act as surrogates for human to be located in different interaction domains, consistent with genetic variations, inducing disease states by influencing local different edgetic perturbations conferring strikingly different and global properties of cellular networks. The inspiration for phenotypes. this concept emerged from classical observations such as the binding of Adenovirus E1A, HPV E7, and SV40 Large T antigen Conclusion to the human retinoblastoma protein, which is the product of A comprehensive catalog of sequence variations among the 7 a gene in which mutations lead to a predisposition to retinoblas- billion human genomes present on earth might soon become

994 Cell 144, March 18, 2011 ª2011 Elsevier Inc. available. This information will continue to revolutionize biology Bertin, N., Simonis, N., Dupuy, D., Cusick, M.E., Han, J.D., Fraser, H.B., Roth, in general and medicine in particular for many decades and F.P., and Vidal, M. (2007). Confirmation of organized modularity in the yeast 5 perhaps centuries to come. The prospects of predictive and interactome. PLoS Biol. , e153. personalized medicine are enormous. However, it should be Blake, W.J., Kaern, M., Cantor, C.R., and Collins, J.J. (2003). Noise in eukary- 422 kept in mind that genome variations merely constitute variations otic gene expression. Nature , 633–637. in the parts list and often fail to provide a description of the mech- Boone, C., Bussey, H., and Andrews, B.J. (2007). Exploring genetic interactions and networks with yeast. Nat. Rev. Genet. 8, 437–449. anistic consequences on cellular functions. Here, we have summarized why considering perturbations of Botstein, D., and Risch, N. (2003). Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for biological networks within cells is crucial to help interpret how complex disease. Nat. Genet. Suppl. 33, 228–237. genome variations relate to phenotypic differences. Given their Botstein, D., White, R.L., Skolnick, M., and Davis, R.W. (1980). Construction of high levels of complexity, it is no surprise that interactome a genetic linkage map in man using restriction fragment length polymorphisms. networks have not yet been mapped completely. The data and Am. J. Hum. Genet. 32, 314–331. models accumulated in the last decade point to clear directions Boulton, S.J., Gartner, A., Reboul, J., Vaglio, P., Dyson, N., Hill, D.E., and Vidal, for the next decade. We envision that with more interactome M. (2002). Combined functional genomic maps of the C. elegans DNA damage datasets of increasingly high quality, the trends reviewed here response. Science 295, 127–131. will be fine tuned. The global properties observed so far and Calderwood, M.A., Venkatesan, K., Xing, L., Chase, M.R., Vazquez, A., those yet to be uncovered should help ‘‘make sense’’ of the enor- Holthaus, A.M., Ewence, A.E., Li, N., Hirozane-Kishikawa, T., Hill, D.E., et al. mous body of information encompassed in the human genome. (2007). Epstein-Barr virus and virus human protein interaction maps. Proc. Natl. Acad. Sci. USA 104, 7606–7611. ACKNOWLEDGMENTS Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J., et al. (2004). We thank David E. Hill, Matija Dreze, Anne-Ruxandra Carvunis, Benoit Charlo- Unbiased mapping of transcription factor binding sites along human chromo- teaux, Quan Zhong, Balaji Santhanam, Sam Pevzner, Song Yi, Nidhi Sahni, somes 21 and 22 points to widespread regulation of noncoding RNAs. Cell Jean Vandenhaute, and Roseann Vidal for careful reading of the manuscript. 116, 499–509. Interactome mapping efforts at CCSB have been supported mainly by National Charbonnier, S., Gallego, O., and Gavin, A.C. (2008). The social network of Institutes of Health grant R01-HG001715. M.V. is grateful to Nadia Rosenthal a cell: recent advances in interactome mapping. Biotechnol. Annu. Rev. 14, for the peaceful Suttonian environment. We apologize to those in the field 1–28. whose important work was not cited here due to space limitation. Colland, F., Jacq, X., Trouplin, V., Mougin, C., Groizeleau, C., Hamburger, A., Meil, A., Wojcik, J., Legrain, P., and Gauthier, J.M. (2004). Functional proteo- REFERENCES mics mapping of a human signaling pathway. Genome Res. 14, 1324–1332. Collins, S.R., Kemmeren, P., Zhao, X.C., Greenblatt, J.F., Spencer, F., Agarwal, S., Deane, C.M., Porter, M.A., and Jones, N.S. (2010). Revisiting date Holstege, F.C., Weissman, J.S., and Krogan, N.J. (2007). Towards a compre- and party hubs: novel approaches to role assignment in protein interaction hensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. 6 networks. PLoS Comput. Biol. , e1000817. Cell. Proteomics 6, 439–450. Ahn, Y.Y., Bagrow, J.P., and Lehmann, S. (2010). Link communities reveal Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., 466 multiscale complexity in networks. Nature , 761–764. Ding, H., Koh, J.L., Toufighi, K., Mostafavi, S., et al. (2010). The genetic Albert, R., and Barabasi, A.L. (2002). Statistical mechanics of complex landscape of a cell. Science 327, 425–431. 74 networks. Rev. Mod. Phys. , 47–97. Cusick, M.E., Yu, H., Smolyar, A., Venkatesan, K., Carvunis, A.R., Simonis, N., Alon, U. (2007). Network motifs: theory and experimental approaches. Nat. Rual, J.F., Borick, H., Braun, P., Dreze, M., et al. (2009). Literature-curated Rev. Genet. 8, 450–461. protein interaction datasets. Nat. Methods 6, 39–46. Altshuler, D., Daly, M.J., and Lander, E.S. (2008). Genetic mapping in human de Chassey, B., Navratil, V., Tafforeau, L., Hiet, M.S., Aublin-Gex, A., Agaugue, disease. Science 322, 881–888. S., Meiffren, G., Pradezynski, F., Faria, B.F., Chantier, T., et al. (2008). Hepatitis Amberger, J., Bocchini, C.A., Scott, A.F., and Hamosh, A. (2009). McKusick’s C virus infection protein network. Mol. Syst. Biol. 4, 230. Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 37, Davis, R.H. (2004). The age of model organisms. Nat. Rev. Genet. 5, 69–76. D793–D796. DeCaprio, J.A. (2009). How the Rb tumor suppressor structure and function Amit, I., Garber, M., Chevrier, N., Leite, A.P., Donner, Y., Eisenhaure, T., was revealed by the study of Adenovirus and SV40. Virology 384, 274–284. Guttman, M., Grenier, J.K., Li, W., Zuk, O., et al. (2009). Unbiased reconstruc- Deplancke, B., Dupuy, D., Vidal, M., and Walhout, A.J. (2004). A Gateway- tion of a mammalian transcriptional network mediating pathogen responses. compatible yeast one-hybrid system. Genome Res. 14, 2093–2101. Science 326, 257–263. Deplancke, B., Mukhopadhyay, A., Ao, W., Elewa, A.M., Grove, C.A., Martinez, Baraba´ si, A.L., and Albert, R. (1999). Emergence of scaling in random N.J., Sequerra, R., Doucette-Stamm, L., Reece-Hoyes, J.S., Hope, I.A., et al. networks. Science 286, 509–512. (2006). A gene-centered C. elegans protein-DNA interaction network. Cell 125, Baraba´ si, A.L., Gulbahce, N., and Loscalzo, J. (2010). Network medicine: 1193–1205. a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68. Dreze, M., Monachello, D., Lurin, C., Cusick, M.E., Hill, D.E., Vidal, M., and Baraba´ si, A.L., and Oltvai, Z.N. (2004). Network biology: understanding the Braun, P. (2010). High-quality binary interactome mapping. Methods cell’s functional organization. Nat. Rev. Genet. 5, 101–113. Enzymol. 470, 281–315. Bartel, P.L., Roecklein, J.A., SenGupta, D., and Fields, S. (1996). A protein Edwards, J.S., Ibarra, R.U., and Palsson, B.O. (2001). In silico predictions of linkage map of Escherichia coli bacteriophage T7. Nat. Genet. 12, 72–77. Escherichia coli metabolic capabilities are consistent with experimental Beadle, G.W., and Tatum, E.L. (1941). Genetic control of biochemical reactions data. Nat. Biotechnol. 19, 125–130. Neurospora 27 in . Proc. Natl. Acad. Sci. USA , 499–506. Ekman, D., Light, S., Bjorklund, A.K., and Elofsson, A. (2006). What properties Beltrao, P., Cagney, G., and Krogan, N.J. (2010). Quantitative genetic interac- characterize the hub proteins of the protein-protein interaction network of tions reveal biological modularity. Cell 141, 739–745. Saccharomyces cerevisiae? Genome Biol. 7, R45.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 995 Elowitz, M.B., Levine, A.J., Siggia, E.D., and Swain, P.S. (2002). Stochastic ically organized modularity in the yeast protein-protein interaction network. gene expression in a single cell. Science 297, 1183–1186. Nature 430, 88–93. Ergun, A., Lawrence, C.A., Kohanski, M.A., Brennan, T.A., and Collins, J.J. Hartley, J.L., Temple, G.F., and Brasch, M.A. (2000). DNA cloning using in vitro (2007). A network biology approach to prostate cancer. Mol. Syst. Biol. 3, 82. site-specific recombination. Genome Res. 10, 1788–1795. Ewing, B., Hillier, L., Wendl, M.C., and Green, P. (1998). Base-calling of auto- Hidalgo, C.A., Blumm, N., Barabasi, A.L., and Christakis, N.A. (2009). mated sequencer traces using phred. I. Accuracy assessment. Genome Res. A dynamic network approach for the study of human phenotypes. PLoS 8, 175–185. Comput. Biol. 5, e1000353. Fields, S., and Song, O. (1989). A novel genetic system to detect protein- Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S.L., Millar, A., protein interactions. Nature 340, 245–246. Taylor, P., Bennett, K., Boutilier, K., et al. (2002). Systematic identification of Finley, R.L., Jr., and Brent, R. (1994). Interaction mating reveals binary and protein complexes in Saccharomyces cerevisiae by mass spectrometry. ternary connections between Drosophila cell cycle regulators. Proc. Natl. Nature 415, 180–183. 91 Acad. Sci. USA , 12980–12984. Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. (2001). Fraser, H.B. (2005). Modularity and evolutionary constraint on proteins. Nat. A comprehensive two-hybrid analysis to explore the yeast protein interac- Genet. 37, 351–352. tome. Proc. Natl. Acad. Sci. USA 98, 4569–4574. Fraser, H.B., Hirsh, A.E., Steinmetz, L.M., Scharfe, C., and Feldman, M.W. Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, (2002). Evolutionary rate in the protein interaction network. Science 296, K., Kuhara, S., and Sakaki, Y. (2000). Toward a protein-protein interaction map 750–752. of the budding yeast: A comprehensive system to examine two-hybrid interac- Fraser, H.B., and Plotkin, J.B. (2007). Using protein complexes to predict tions in all possible combinations between the yeast proteins. Proc. Natl. phenotypic effects of gene mutation. Genome Biol. 8, R252. Acad. Sci. USA 97, 1143–1147. Fromont-Racine, M., Rain, J.C., and Legrain, P. (1997). Toward a functional Ivanic, J., Yu, X., Wallqvist, A., and Reifman, J. (2009). Influence of protein analysis of the yeast genome through exhaustive two-hybrid screens. Nat. abundance on high-throughput protein-protein interaction detection. PLoS Genet. 16, 277–282. ONE 4, e5815. Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, Ja¨ ger, S., Gulbahce, N., Cimermancic, P., Kane, J., He, N., Chou, S., D’Orso, C., Jensen, L.J., Bastuck, S., Dumpelfeld, B., et al. (2006). Proteome survey I., Fernandes, J., Jang, G., Frankel, A.D., et al. (2010). Purification and reveals modularity of the yeast cell machinery. Nature 440, 631–636. characterization of HIV-human protein complexes. Methods 53, 13–19. Gavin, A.C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Jansen, R., Greenbaum, D., and Gerstein, M. (2002). Relating whole-genome Schultz, J., Rick, J.M., Michon, A.M., Cruciat, C.M., et al. (2002). Functional expression data with protein-protein interactions. Genome Res. 12, 37–46. organization of the yeast proteome by systematic analysis of protein Jeong, H., Mason, S.P., Barabasi, A.L., and Oltvai, Z.N. (2001). Lethality and 415 complexes. Nature , 141–147. centrality in protein networks. Nature 411, 41–42. Ge, H., Liu, Z., Church, G.M., and Vidal, M. (2001). Correlation between tran- Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Barabasi, A.L. (2000). The scriptome and interactome mapping data from Saccharomyces cerevisiae. large-scale organization of metabolic networks. Nature 407, 651–654. Nat. Genet. 29, 482–486. Johannsen, W. (1909). Elemente der exakten Erblichkeitslehre (Jena: Gustav Ge, H., Walhout, A.J., and Vidal, M. (2003). Integrating ‘omic’ information: Fischer). a bridge between genomics and systems biology. Trends Genet. 19, 551–560. Jonsson, P.F., and Bates, P.A. (2006). Global topological features of cancer Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, L., Veronneau, S., Dow, S., proteins in the human interactome. Bioinformatics 22, 2291–2297. Lucau-Danila, A., Anderson, K., Andre, B., et al. (2002). Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387–391. Jordan, I.K., Wolf, Y.I., and Koonin, E.V. (2003). No simple dependence between protein evolution rate and the number of protein-protein interactions: Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., Kuang, B., Li, Y., Hao, Y.L., only the most prolific interactors tend to evolve slowly. BMC Evol. Biol. 3,1. Ooi, C.E., Godwin, B., Vitols, E., et al. (2003). A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736. Kahali, B., Ahmad, S., and Ghosh, T.C. (2009). Exploring the evolutionary rate differences of party hub and date hub proteins in Saccharomyces cerevisiae Goh, K.I., Cusick, M.E., Valle, D., Childs, B., Vidal, M., and Barabasi, A.L. protein-protein interaction network. Gene 429, 18–22. (2007). The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685– 8690. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Grigoriev, A. (2001). A relationship between gene expression and protein inter- Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., et al. (2008). KEGG 36 actions on the proteome scale: analysis of the bacteriophage T7 and the yeast for linking genomes to life and the environment. Nucleic Acids Res. , Saccharomyces cerevisiae. Nucleic Acids Res. 29, 3513–3519. D480–D484. Grove, C.A., De Masi, F., Barrasa, M.I., Newburger, D.E., Alkema, M.J., Bulyk, Karginov, F.V., Conaco, C., Xuan, Z., Schmidt, B.H., Parker, J.S., Mandel, G., M.L., and Walhout, A.J. (2009). A multiparameter network reveals extensive and Hannon, G.J. (2007). A biochemical approach to identifying microRNA 104 divergence between C. elegans bHLH transcription factors. Cell 138, 314–327. targets. Proc. Natl. Acad. Sci. USA , 19291–19296. Gunsalus, K.C., Ge, H., Schetter, A.J., Goldberg, D.S., Han, J.D., Hao, T., Kemmeren, P., van Berkum, N.L., Vilo, J., Bijma, T., Donders, R., Brazma, A., Berriz, G.F., Bertin, N., Huang, J., Chuang, L.S., et al. (2005). Predictive models and Holstege, F.C. (2002). Protein interaction verification and functional of molecular machines involved in Caenorhabditis elegans early embryogen- annotation by integrated analysis of genome-scale data. Mol. Cell 9, esis. Nature 436, 861–865. 1133–1143. Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J.M., Eizinger, A., microRNAs predominantly act to decrease target mRNA levels. Nature 466, Wylie, B.N., and Davidson, G.S. (2001). A gene expression map for 835–840. Caenorhabditis elegans. Science 293, 2087–2092. Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Rothballer, A., Ascano, M., Jr., Jungkamp, A.C., Munschauer, M., et al. (2010). Pu, S., Datta, N., Tikuisis, A.P., et al. (2006). Global landscape of protein Transcriptome-wide identification of RNA-binding protein and microRNA complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643. 141 target sites by PAR-CLIP. Cell , 129–141. Kuhner, S., van Noort, V., Betts, M.J., Leo-Macias, A., Batisse, C., Rode, M., Han, J.D., Bertin, N., Hao, T., Goldberg, D.S., Berriz, G.F., Zhang, L.V., Dupuy, Yamada, T., Maier, T., Bader, S., Beltran-Alvarez, P., et al. (2009). Proteome D., Walhout, A.J., Cusick, M.E., Roth, F.P., et al. (2004). Evidence for dynam- organization in a genome-reduced bacterium. Science 326, 1235–1240.

996 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Perlis, R.H., Smoller, J.W., Mysore, J., Sun, M., Gillis, T., Purcell, S., Rietschel, Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing M., Nothen, M.M., Witt, S., Maier, W., et al. (2010). Prevalence of incompletely and analysis of the human genome. Nature 409, 860–921. penetrant Huntington’s disease alleles among individuals with major depres- 167 Lee, D.S., Park, J., Kay, K.A., Christakis, N.A., Oltvai, Z.N., and Barabasi, A.L. sive disorder. Am. J. Psychiatry , 574–579. (2008). The implications of human metabolic network topology for disease Piano, F., Schetter, A.J., Morton, D.G., Gunsalus, K.C., Reinke, V., Kim, S.K., comorbidity. Proc. Natl. Acad. Sci. USA 105, 9880–9885. and Kemphues, K.J. (2002). Gene clustering based on RNAi phenotypes of Lee, I., Lehner, B., Vavouri, T., Shin, J., Fraser, A.G., and Marcotte, E.M. (2010). ovary-enriched genes in C. elegans. Curr. Biol. 12, 1959–1964. Predicting genetic modifier loci using functional gene networks. Genome Res. Plewczynski, D., and Ginalski, K. (2009). The interactome: predicting the 20, 1143–1153. protein-protein interactions in cells. Cell. Mol. Biol. Lett. 14, 1–22. Lee, R., Feinbaum, R., and Ambros, V. (2004). A short history of a short RNA. Pujana, M.A., Han, J.-D.J., Starita, L.M., Stevens, K.N., Tewari, M., Ahn, J.S., Cell 116, S89–S92. Rennert, G., Moreno, V., Kirchhoff, T., Gold, B., et al. (2007). Network modeling Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Gerber, G.K., links breast cancer susceptibility and centrosome dysfunction. Nat. Genet. 39, Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon, I., et al. (2002). 1338–1349. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science Raser, J.M., and O’Shea, E.K. (2005). Noise in gene expression: origins, 298, 799–804. consequences, and control. Science 309, 2010–2013. Li, S., Armstrong, C.M., Bertin, N., Ge, H., Milstein, S., Boxem, M., Vidalain, Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., and Barabasi, A.L. (2002). P.O., Han, J.D., Chesneau, A., Hao, T., et al. (2004). A map of the interactome Hierarchical organization of modularity in metabolic networks. Science 297, network of the metazoan C. elegans. Science 303, 540–543. 1551–1555. Lim, J., Hao, T., Shaw, C., Patel, A.J., Szabo, G., Rual, J.F., Fisk, C.J., Li, N., Reboul, J., Vaglio, P., Rual, J.F., Lamesch, P., Martinez, M., Armstrong, C.M., Smolyar, A., Hill, D.E., et al. (2006). A protein-protein interaction network for Li, S., Jacotot, L., Bertin, N., Janky, R., et al. (2003). C. elegans ORFeome human inherited ataxias and disorders of Purkinje cell degeneration. Cell version 1.1: experimental verification of the genome annotation and resource 125, 801–814. for proteome-scale protein expression. Nat. Genet. 34, 35–41. Ma, H., Sorokin, A., Mazein, A., Selkov, A., Selkov, E., Demin, O., and Reece-Hoyes, J.S., Deplancke, B., Shingles, J., Grove, C.A., Hope, I.A., and Goryanin, I. (2007). The Edinburgh human metabolic network reconstruction Walhout, A.J. (2005). A compendium of Caenorhabditis elegans regulatory 3 and its functional analysis. Mol. Syst. Biol. , 135. transcription factors: a resource for mapping transcription regulatory Madhani, H.D., Styles, C.A., and Fink, G.R. (1997). MAP kinases with distinct networks. Genome Biol. 6, R110. inhibitory functions impart signaling specificity during yeast differentiation. Rigaut, G., Shevchenko, A., Rutz, B., Wilm, M., Mann, M., and Seraphin, B. 91 Cell , 673–684. (1999). A generic protein purification method for protein complex characteriza- Mani, R., St Onge, R.P., Hartman, J.L., 4th, Giaever, G., and Roth, F.P. (2008). tion and proteome exploration. Nat. Biotechnol. 17, 1030–1032. 105 Defining genetic interaction. Proc. Natl. Acad. Sci. USA , 3461–3466. Roberts, P.M. (2006). Mining literature for systems biology. Brief. Bioinform. 7, Marcotte, E., and Date, S. (2001). Exploiting big biology: integrating large- 399–406. 2 scale biological data for function inference. Brief. Bioinform. , 363–374. Robinson, C.V., Sali, A., and Baumeister, W. (2007). The molecular sociology Martinez, N.J., Ow, M.C., Barrasa, M.I., Hammell, M., Sequerra, R., Doucette- of the cell. Nature 450, 973–982. C. elegans Stamm, L., Roth, F.P., Ambros, V.R., and Walhout, A.J. (2008). A Root, D.E., Hacohen, N., Hahn, W.C., Lander, E.S., and Sabatini, D.M. (2006). genome-scale microRNA network contains composite feedback motifs with Genome-scale loss-of-function screening with a lentiviral RNAi library. Nat. high flux capacity. Genes Dev. 22, 2535–2549. Methods 3, 715–719. Mendez-Rios, J., and Uetz, P. (2010). Global approaches to study protein- Rual, J.F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., Dricot, A., Li, N., protein interactions among viruses and hosts. Future Microbiol. 5, 289–301. Berriz, G.F., Gibbons, F.D., Dreze, M., Ayivi-Guedehoussou, N., et al. (2005). Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., Towards a proteome-scale map of the human protein-protein interaction Sheffer, M., and Alon, U. (2004). Superfamilies of evolved and designed network. Nature 437, 1173–1178. networks. Science 303, 1538–1542. Ruvkun, G., Wightman, B., and Ha, I. (2004). The 20 years it took to recognize Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. the importance of tiny RNAs. Cell 116, S93–S96. (2002). Network motifs: simple building blocks of complex networks. Rzhetsky, A., Wajngurt, D., Park, N., and Zheng, T. (2007). Probing genetic Science 298, 824–827. overlap among complex human phenotypes. Proc. Natl. Acad. Sci. USA Mo, M.L., and Palsson, B.O. (2009). Understanding human metabolic physi- 104, 11694–11699. ology: a genome-to-systems approach. Trends Biotechnol. 27, 37–44. Schuster-Bockler, B., and Bateman, A. (2008). Protein interactions in human Mohr, S., Bakal, C., and Perrimon, N. (2010). Genomic screening with RNAi: genetic diseases. Genome Biol. 9, R9. results and challenges. Annu. Rev. Biochem. 79, 37–64. Schuster, S., Fell, D.A., and Dandekar, T. (2000). A general definition of meta- Novick, P., Osmond, B.C., and Botstein, D. (1989). Suppressors of yeast actin bolic pathways useful for systematic organization and analysis of complex 121 mutations. Genetics , 659–674. metabolic networks. Nat. Biotechnol. 18, 326–332. 3 Nurse, P. (2003). The great ideas of biology. Clin. Med. , 560–568. Seebacher, J., and Gavin, A.-C. (2011). SnapShot: Protein-protein interaction Oberhardt, M.A., Palsson, B.O., and Papin, J.A. (2009). Applications of networks. Cell 144, this issue, 1000. genome-scale metabolic reconstructions. Mol. Syst. Biol. 5, 320. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and Oti, M., Snel, B., Huynen, M.A., and Brunner, H.G. (2006). Predicting disease Friedman, N. (2003). Module networks: identifying regulatory modules and genes using protein-protein interactions. J. Med. Genet. 43, 691–698. their condition-specific regulators from gene expression data. Nat. Genet. Pan, X., Yuan, D.S., Xiang, D., Wang, X., Sookhai-Mahadeo, S., Bader, J.S., 34, 166–176. Hieter, P., Spencer, F., and Boeke, J.D. (2004). A robust toolkit for functional Shapira, S.D., Gat-Viks, I., Shum, B.O.V., Dricot, A., de Grace, M.M., Wu, L., profiling of the yeast genome. Mol. Cell 16, 487–496. Gupta, P.B., Hao, T., Silver, S.J., Root, D.E., et al. (2009). A physical and Park, J., Lee, D.S., Christakis, N.A., and Barabasi, A.L. (2009). The impact of regulatory map of host-influenza interactions reveals pathways in H1N1 infec- cellular networks on disease comorbidity. Mol. Syst. Biol. 5, 262. tion. Cell 139, 1255–1267. Pastor-Satorras, R., Smith, E., and Sole, R.V. (2003). Evolving protein interac- Shen-Orr, S.S., Milo, R., Mangan, S., and Alon, U. (2002). Network motifs in the tion networks through gene duplication. J. Theor. Biol. 222, 199–210. transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68.

Cell 144, March 18, 2011 ª2011 Elsevier Inc. 997 Shoval, O., and Alon, U. (2010). SnapShot: network motifs. Cell 143, Vermeirssen, V., Barrasa, M.I., Hidalgo, C.A., Babon, J.A., Sequerra, R., 326–326.e1. Doucette-Stamm, L., Barabasi, A.L., and Walhout, A.J. (2007). Transcription Simonis, N., Rual, J.F., Carvunis, A.R., Tasan, M., Lemmons, I., Hirozane- factor modularity in a gene-centered C. elegans core neuronal protein-DNA Kishikawa, T., Hao, T., Sahalie, J.M., Venkatesan, K., Gebreab, F., et al. interaction network. Genome Res. 17, 1061–1071. (2009). Empirically controlled mapping of the Caenorhabditis elegans Vidal, M. (1997). The reverse two-hybrid system. In The Yeast Two-Hybrid protein-protein interactome network. Nat. Methods 6, 47–54. System, P. Bartels and S. Fields, eds. (New York: Oxford University Press), Singh, G.P., Ganapathi, M., and Dash, D. (2006). Role of intrinsic disorder in pp. 109–147. transient interactions of hub proteins. Proteins 66, 761–765. Vidal, M. (2001). A biological atlas of functional maps. Cell 104, 333–339. So¨ nnichsen, B., Koski, L.B., Walsh, A., Marschall, P., Neumann, B., Brehm, M., Vidal, M. (2009). A unifying view of 21st century systems biology. FEBS Lett. Alleaume, A.M., Artelt, J., Bettencourt, P., Cassin, E., et al. (2005). Full-genome 583, 3891–3894. RNAi profiling of early embryogenesis in Caenorhabditis elegans. Nature 434, von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., and 462–469. Bork, P. (2002). Comparative assessment of large-scale data sets of protein- Stelzl, U., Worm, U., Lalowski, M., Haenig, C., Brembeck, F.H., Goehler, H., protein interactions. Nature 417, 399–403. Stroedicke, M., Zenkner, M., Schoenherr, A., Koeppen, S., et al. (2005). Wachi, S., Yoneda, K., and Wu, R. (2005). Interactome-transcriptome analysis A human protein-protein interaction network: a resource for annotating the reveals the high centrality of genes differentially expressed in lung cancer proteome. Cell 122, 957–968. tissues. Bioinformatics 21, 4205–4208. Stratton, M.R., Campbell, P.J., and Futreal, P.A. (2009). The cancer genome. Walhout, A.J. (2006). Unraveling transcription regulatory networks by protein- Nature 458, 719–724. DNA and protein-protein interaction mapping. Genome Res. 16, 1445–1454. Stuart, J.M., Segal, E., Koller, D., and Kim, S.K. (2003). A gene-coexpression Walhout, A.J., Reboul, J., Shtanko, O., Bertin, N., Vaglio, P., Ge, H., Lee, H., network for global discovery of conserved genetic modules. Science 302, Doucette-Stamm, L., Gunsalus, K.C., Schetter, A.J., et al. (2002). Integrating 249–255. interactome, phenome, and transcriptome mapping data for the C. elegans Sturtevant, A.H. (1956). A highly specific complementary lethal system in germline. Curr. Biol. 12, 1952–1958. Drosophila melanogaster. Genetics 41, 118–123. Walhout, A.J., Sordella, R., Lu, X., Hartley, J.L., Temple, G.F., Brasch, M.A., Taniguchi, Y., Choi, P.J., Li, G.W., Chen, H., Babu, M., Hearn, J., Emili, A., and Thierry-Mieg, N., and Vidal, M. (2000a). Protein interaction mapping in Xie, X.S. (2010). Quantifying E. coli proteome and transcriptome with single- C. elegans using proteins involved in vulval development. Science 287, molecule sensitivity in single cells. Science 329, 533–538. 116–122. Taylor, I.W., Linding, R., Warde-Farley, D., Liu, Y., Pesquita, C., Faria, D., Walhout, A.J., Temple, G.F., Brasch, M.A., Hartley, J.L., Lorson, M.A., van den Bull, S., Pawson, T., Morris, Q., and Wrana, J.L. (2009). Dynamic modularity Heuvel, S., and Vidal, M. (2000b). GATEWAY recombinational cloning: applica- in protein interaction networks predicts breast cancer outcome. Nat. tion to the cloning of large numbers of open reading frames or ORFeomes. Biotechnol. 27, 199–204. Methods Enzymol. 328, 575–592. Tong, A.H., Evangelista, M., Parsons, A.B., Xu, H., Bader, G.D., Page, N., Walhout, A.J., and Vidal, M. (2001). Protein interaction maps for model Robinson, M., Raghibizadeh, S., Hogue, C.W., Bussey, H., et al. (2001). organisms. Nat. Rev. Mol. Cell Biol. 2, 55–62. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Wu, X., Jiang, R., Zhang, M.Q., and Li, S. (2008). Network-based global Science 294, 2364–2368. inference of human disease genes. Mol. Syst. Biol. 4, 189. Turinsky, A.L., Razick, S., Turner, B., Donaldson, I.M., and Wodak, S.J. (2010). Wuchty, S., Oltvai, Z.N., and Barabasi, A.L. (2003). Evolutionary conservation Literature curation of protein interactions: measuring agreement across major of motif constituents in the yeast protein interaction network. Nat. Genet. 35, public databases. Database (Oxford), 2010, baq026. 176–179. Uetz, P., Dong, Y.A., Zeretzke, C., Atzler, C., Baiker, A., Berger, B., Yeger-Lotem, E., Sattath, S., Kashtan, N., Itzkovitz, S., Milo, R., Pinter, R.Y., Rajagopala, S.V., Roupelieva, M., Rose, D., Fossum, E., et al. (2006). Alon, U., and Margalit, H. (2004). Network motifs in integrated cellular networks Herpesviral protein networks and their interaction with the human proteome. of transcription-regulation and protein-protein interaction. Proc. Natl. Acad. Science 311, 239–242. Sci. USA 101, 5934–5939. Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Yu, H., Braun, P., Yildirim, M.A., Lemmens, I., Venkatesan, K., Sahalie, J., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., et al. (2000). A compre- Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N., et al. (2008). High- hensive analysis of protein-protein interactions in Saccharomyces cerevisiae. quality binary protein interaction map of the yeast interactome network. 403 Nature , 623–627. Science 322, 104–110. Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., and Luscombe, N.M. Zhang, L.V., King, O.D., Wong, S.L., Goldberg, D.S., Tong, A.H.Y., Lesage, G., (2009). A census of human transcription factors: function, expression and Andrews, B., Bussey, H., Boone, C., and Roth, F.P. (2005). Motifs, themes and 10 evolution. Nat. Rev. Genet. , 252–263. thematic maps of an integrated Saccharomyces cerevisiae interaction Va´ zquez, A., Flammini, A., Maritan, A., and Vespignani, A. (2003). Modeling of network. J. Biol. 4,6. 1 protein interaction networks. Complexus , 38–44. Zhong, Q., Simonis, N., Li, Q.R., Charloteaux, B., Heuze, F., Klitgord, N., Tam, Venkatesan, K., Rual, J.F., Vazquez, A., Stelzl, U., Lemmens, I., Hirozane- S., Yu, H., Venkatesan, K., Mou, D., et al. (2009). Edgetic perturbation models Kishikawa, T., Hao, T., Zenkner, M., Xin, X., Goh, K.I., et al. (2009). An empirical of human inherited disorders. Mol. Syst. Biol. 5, 321. 6 framework for binary interactome mapping. Nat. Methods , 83–90. Zhu, C., Byers, K.J., McCord, R.P., Shi, Z., Berger, M.F., Newburger, D.E., Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Saulrieta, K., Smith, Z., Shah, M.V., Radhakrishnan, M., et al. (2009). High- Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. (2001). The sequence resolution DNA-binding specificity analysis of yeast transcription factors. of the human genome. Science 291, 1304–1351. Genome Res 19, 556–566.

998 Cell 144, March 18, 2011 ª2011 Elsevier Inc. SnapShot: Protein-Protein Interaction Networks Jan Seebacher and Anne-Claude Gavin Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany

EXPERIMENTAL METHODS FOR CHARTING PROTEIN-PROTEIN INTERACTION NETWORKS Binary interactions Molecular machines/Protein complexes comembership

Methods Split proteins Assay/Readout Methods

Yeast two-hybrid Transcription factor, ubiquitin Transcription Affinity purification/Mass spectrometry Protein fragment Dehydrofolate reductase Antibiotic resistence Biochemical purification of affinity-tagged baits followed by complementation GFP or YFP Fluorescence MS identification of copurifying preys assay

A B + A B C D E F G H I Bait Prey Complex 1 A B C D E F G H I Bait Prey A - + +--- + ++ A A C + E Protein pairs Socio-affinity B + -- + -- - -- + Interacting A C B A D - C + - - + + - - -- - Noninteracting B D C Interaction examples D - + + ------D - Allosteric A E + Interaction examples E + - + ------Complex 2 E - Chaperone-assisted - Signaling A F + F + ------+ F - Enzyme - substrate A Interaction strength G ------+ + H G -Stable B F - Interaction strength F H + ---- - + -- G H -Transient to stable I B G - I ---- - + + -- I

INTERACTION NETWORK DATA QUALITY / BENCHMARK PARAMETERS PREDICTION OF PROTEIN COMPLEX TOPOLOGY

Novel interactions & false positives False negatives True negatives “Gold standard” Spoke model Matrix model Socio-affinity model (PRS, positive reference list) Experimental PPI data set Socio- “Negative set” affinity (RRS, random reference set, or set of proteins unlikely to interact)

Coverage False positives

NETWORK COMPONENTS NETWORK TOPOLOGIES

Random network Scale-free network Hierarchical network (Biological/cellular networks) (Many types of real networks)

Party hubs: Hub: node same time and with high degree space

Edge: link between two nodes (interaction)

Node (protein) -Degrees follow Poisson -Degrees follow power-law -Degrees follow power-law (or peaked) distribution distributions distributions Date hubs: -Vulnerability to failure -Robustness against random -Account for modularity, local different time and/ Expression profiles failure clustering, and scale-free or space and/or localization -Vulnerability to targeted topology attacks -High clustering coefficient (C)

NETWORK MEASURES

Degree/ Clustering coefficient/ Assortativity/average nearest Shortest path (SP) Betweenness/ connectivity (k) interconnectivity (C) neighbor’s connectivity (NC) between two nodes centrality (B)

C C C C G G G G Actual links between A’s C neighbors (black) D B D B I F B D A B I A C = A I A A A Possible links between A’s H neighbors (orange) H H H B F K F D J D J E J K K E K E E E B =Fraction of SPs passing C =n /[k (k -1)/2] NC =(k +kk +kk +kk +kk )/5 4 k =Nb of edges through A=5 A A A A A B C D E J SP =(F,D,A,B,H)=4 through A A =2/[4x(4-1)/2]=0.333 =(5+2+2+3+1)/5=2.6 FH =0.090

1000 Cell 144, March 18, 2011 ©2011 Elsevier Inc. DOI 10.1016/j.cell.2011.02.025 See online version for legend and references. SnapShot: Protein-Protein Interaction Networks Jan Seebacher and Anne-Claude Gavin Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany

Cellular functions result from the coordinated action of groups of proteins interacting in molecular assemblies, pathways, or networks. The systematic and unbiased charting of protein-protein interaction (PPI) networks relevant to health or disease in a variety of organisms has become an important challenge in systems biology. Here, we review the main parameters and characteristics used to define PPI networks.

Experimental Methods for Charting Protein-Protein Interaction Networks Genetic, ex vivo methods, such as the yeast two-hybrid system or protein complementation assays, measure pairwise binary, even transient, interactions. Biochemical approaches relying on affinity purification and mass spectrometry-based protein identification (AP-MS) are designed for the characterization of sturdy protein complexes. Direct, binary associations can be inferred from highly reciprocal AP-MS data sets by computational methods, such as the spoke and matrix models or socio-affinity scoring.

Data Quality The overall quality of protein-protein interaction data sets is routinely evaluated through comparison with golden standards (lists of known high-confidence, literature-curated interacting proteins) and negative sets (lists of proteins that do not interact). This is generally replaced by sets of proteins that are unlikely to interact: for example, sets of random protein pairs or lists of proteins never reported to “co-occur” (coexpression, colocalization, coconservation, gene fusion, synteny, text mining, etc.). To ensure proper use by other scientists, researchers are encouraged to follow the molecular interaction experiment (MIMIx) guidelines when reporting PPI data to public databases (Orchard et al., 2007). The following benchmark parameters can be deduced: (1) the false positive rate (FPR, background) is the fraction of identified interactions that are in the negative set; (2) the true positive rate (TPR, or accuracy) is the fraction of identified, genuine interacting protein pairs: TPR = 1 − FPR; (3) the false negative rate (FNR) is the fraction of the golden standard interactions that have been missed; (4) the coverage is the fraction of the golden standard covered by an experimental data set: coverage = 1 − FNR; and (5) the true negative rate (TNR) is the fraction of unidentified, genuine, noninteracting protein pairs.

Prediction of Protein Complex Topologies The spoke model assumes that the bait interacts directly with each one of the copurifying proteins. The matrix model assumes that any two proteins within the set of copurifying proteins have pairwise direct interactions. The socio-affinity scoring integrates both the spoke and matrix models. It measures the frequency with which pairs of proteins were found associated within sets of biochemical purifications; protein pairs with high scores are more likely to be in direct physical contact.

Network Components Node, protein; edge, interaction or link between two nodes; hub, node with a high degree (see below); date hub, hub that does not coexpress and/or colocalize with interacting nodes (interactions at different times or locations); party hub, hub that coexpresses and/or colocalizes with interacting nodes (simultaneous interactions).

Network Topologies and Structures In random networks, all nodes have approximately the same number of edges (same degree, see below). In scale-free networks, most of the nodes have only a few edges, and a few nodes (also called hubs) have a very large number of edges. The degree distribution approximates a power law. In hierarchical networks, sparsely linked nodes are components of highly clustered areas, and a few hubs connect these highly clustered regions. Hierarchical networks are characterized by their modularity, a power-law degree distribution, and a large average clustering coefficient. The discrete topologies confer different network vulnerability to perturbation. Whereas random networks are sensitive to random attacks, scale-free networks are resistant to such failures. However, they show susceptibility to targeted attacks.

Network Measures The degree, or connectivity (k), measures how many edges a node has to other nodes. The degree distribution, P(k), gives the probability that a selected node has exactly k edges. The degree distribution is used to characterize different classes of networks. The clustering coefficient (C) measures the degree of interconnectivity in the neighborhood of a node. This is the ratio between the number of observed links between the neighbors of a node and the number of all possible links between the neighbors of this node if all of this node’s neighbors were connected to each other. The average clustering coefficient, < C >, characterizes the overall tendency of nodes to form clusters. It is a measure of a network’s hierarchical property. The assortativity, or neighbor connectivity (NC), represents the average degree of the nearest neighbors of a node. In PPI networks, highly connected nodes (hubs) do not link directly to each other but interact with sparsely connected nodes. The distance, or path length, between two nodes represents the number of edges that separate the two nodes. There are many alternative paths between two nodes; the shortest path (SP) has the smallest number of edges. The betweenness (B) is a measure of a node’s centrality in a network. It represents the fraction of all of the shortest paths between all nodes in a network that pass through a given node.

Protein-Protein Interaction Resources Biocarta, www.biocarta.com; BioGRID, www.thebiogrid.org; BOND, www.bond.unleashedinformatics.com; DIP, www.dip.doe-mbi.ucla.edu; HPRD, www.hprd.org; IntAct, www.ebi.ac.uk/intact; MINT, www.mint.bio.uniroma2.it; Reactome, www.reactome.org; SGD, www.yeastgenome.org; STRING, www.string-db.org.

Visualization Tools for Biological Networks Cytoscape, www.cytoscape.org; NAViGaTOR, www.ophid.utoronto.ca/navigator; Osprey, www.biodata.mshri.on.ca/osprey/servlet/Index.

References

Behrends, C., Sowa, M.E., Gygi, S.P., and Harper, J.W. (2010). Network organization of the human autophagy system. Nature 466, 68–76.

Han, J.D.J., Bertin, N., Hao, T., Goldberg, D.S., Berriz, G.F., Zhang, L.V., Dupuy, D., Walhout, A.J., Cusick, M.E., Roth, F.P., and Vidal, M. (2004). Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430, 88–93.

Kühner, S., van Noort, V., Betts, M.J., Leo-Macias, A., Batisse, C., Rode, M., Yamada, T., Maier, T., Bader, S., Beltran-Alvarez, P., et al. (2009). Proteome organization in a genome- reduced bacterium. Science 326, 1235–1240.

Orchard, S., Salwinski, L., Kerrien, S., Montecchi-Palazzi, L., Oesterheld, M., Stümpflen, V., Ceol, A., Chatr-aryamontri, A., Armstrong, J., Woollard, P., et al. (2007). The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat. Biotechnol. 25, 894–898.

Tarassov, K., Messier, V., Landry, C.R., Radinovic, S., Serna Molina, M.M., Shames, I., Malitskaya, Y., Vogel, J., Bussey, H., and Michnick, S.W. (2008). An in vivo map of the yeast protein interactome. Science 320, 1465–1470.

Venkatesan, K., Rual, J.F., Vazquez, A., Stelzl, U., Lemmens, I., Hirozane-Kishikawa, T., Hao, T., Zenkner, M., Xin, X., Goh, K.I., et al. (2009). An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90.

Yamada, T., and Bork, P. (2009). Evolution of biomolecular networks: lessons from metabolic and protein interactions. Nat. Rev. Mol. Cell Biol. 10, 791–803.

Yu, H., Braun, P., Yildirim, M.A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N., et al. (2008). High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110.