Leading Edge Essay
Network News: Innovations in 21st Century Systems Biology
Adam P. Arkin1,4,* and David V. Schaffer1,2,3,4 1Department of Bioengineering 2Department of Chemical and Biomolecular Engineering 3The Helen Wills Neuroscience Institute University of California, Berkeley, Berkeley, CA 94720, USA 4Physical Biosciences Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.008
A decade ago, seminal perspectives and papers set a strong vision for the field of systems biology, and a number of these themes have flourished. Here, we describe key technologies and insights that have elucidated the evolution, architecture, and function of cellular networks, ultimately leading to the first predictive genome-scale regulatory and metabolic models of organisms. Can systems approaches bridge the gap between correlative analysis and mechanistic insights?
System biology aims to understand how American mathematician Norbert Weiner, numerous fundamental principles came individual elements of the cell interact to along with his coauthors, proposed that to light. These included the possible generate behaviors that allow survival in negative feedback loops would be central mechanisms and advantages of different changeable environments and collective to maintaining this stability in biological biochemical switches and oscillators cellular organization into structured systems (Rosenbleuth et al., 1943), thus with and without biochemical noise communities. Ultimately, these cellular linking concepts of control and optimality (Goodwin, 1963); new models of meta- networks assemble into larger population with biological dynamics. Ten years later, bolic control and engineering (Heinrich networks to form large-scale ecologies the British developmental biologist Con- and Rapoport, 1974; Kacser and Burns, and thinking machines, such as humans. rad Waddington laid some of the modern 1973); the reverse engineering of cellular Given this central focus on codifying the foundation for systems biology when he networks (Bekey and Beneken, 1978); organizational principles and algorithms presciently conceptualized networks of and abstracted models of these networks of life, we argue that systems biology is cellular components (i.e., genes, cells, to understand the evolution and optimiza- not a newly emerging field, but rather and tissues) as evolutionarily dynamical tion of specific network ‘‘designs’’ (Kauff- a mature synthesis of thought about the systems expressible as solutions to man, 1969). Indeed, these latter principles implications of biological structure and a series of simultaneous differential equa- of how networks can be structured to its dynamic organization, ideas that have tions. Over his long career, Waddington achieve particular functions have been been brewing for more than a century. argued for a truly dynamic systems theory used more recently to explicitly predict To many scientists, the beginning of the of cellular decision making driven by gene natural network behavior. last decade marked the definition and expression and epigenetics (Waddington, Thus, by the early 1970s, the concepts rise of the field of systems biology. 1954, 1977). When Jacques Lucien Jacob and components were all in place However, systems biology’s conceptual and Franc¸ois Monod unveiled the molec- for what encompasses most of what we origins date back almost 100 years. In ular mechanisms of gene regulation in call ‘‘systems biology’’—the integrated 1917, D’Arcy Thompson formalized the 1962, they noted, ‘‘it is obvious from the molecular analysis of cellular networks. first link between development, evolution, analysis of these [bacterial genetic regula- However, one roadblock remained: and physics in his treatise On Growth and tory] mechanisms that their known experimental data to support the models Form, when he observed that shapes elements could be connected into and hypotheses. This is where the last and function of biological systems were a wide variety of ‘circuits’ endowed with two decades have revolutionized the field fundamentally determined by physical any desired degree of stability’’ (Jacob of cellular network inference and analysis. requirements and mechanical laws. In and Monod, 1962). Since the early 1990s, a vast array of 1939, Walter Canon, then chairman of During the ensuing decade, scientists technologies has dramatically improved the Department of Physiology at Harvard across a wide array of disciplines started the efficiency of manipulating cells genet- Medical School, coined the term ‘‘homeo- exploring the nonlinear dynamics in ically, the measurement of cellular compo- stasis’’ when he noted that organisms biochemical networks. Although experi- nents at high precision and completeness, hold essential physiological variables at mental data to support their theoretical and the dissemination of materials and constant values despite a fluctuating hypotheses were still largely missing, information at unprecedented speeds environment (Canon, 1939). In 1943, the this period was quite productive, as (due to the other network revolution, which
844 Cell 144, March 18, 2011 ª2011 Elsevier Inc. distantly removed from causation; indeed, the quadrants are connected. However, when we asked a group of colleagues which systems biology papers over the last decade have been most important to the field, the resulting set of landmark studies naturally clustered into different regions of this systems biology ‘‘plane’’ (Figure 1). Correlative Approaches Genome-scale data have fundamentally changed the types of questions that we ask about cellular systems. We can now observe how genomes dynamically change expression in response to environ- mental conditions and then correlate these results to other phenotypes, such as growth, fate choices, and biosynthetic productivity. Such experiments have inspired several classes of analysis that can vastly improve the data-driven anno- tation of genomes, more strongly link genotype to phenotype through inferred networks of interaction, and predict behaviors of cellular systems (Figure 1, lower-left quadrant). They have also led Figure 1. A Simplified Scheme for Organizing Results in the Field of Systems Biology to a wide array of conceptual interpreta- References are placed (subjectively) into this space according to whether their respective study focused tions about the organization and evolution more on mechanistic insight or on large-scale correlation analysis (the x axis) and whether the results were primarily principles about cellular networks or predictions of their behavior (the y axis). (Because of space of cellular networks into evolvable constraints, only the last name of the first author is given). modules, the decomposition of these networks into recurrent regulatory ‘‘motifs’’ with useful dynamical function, has also left a conceptual mark on studies, which are usually on the genomic and the robustness of these architectures systems biology). Many of these biological scale, infer relationships among genes to mutation (Figure 1, upper-left quadrant). technologies are scaling by a Moore’s and modules of function. These studies Correlative Approaches Law-type (Moore, 1965) dynamic in which can also annotate genes and their to Predicting Function every few years, the amount of DNA that products by a ‘‘guilt-by-association’’ One type of analysis infers properties of can be sequenced or synthesized doubles approach in which detailed biochemical biomolecules from correlated changes of in size for half the cost (as has the number information available about one gene or genome-scale RNA, protein, DNA copy of transistors on a microchip) (Carlson, system is transferred to others with corre- number, or metabolite abundance as it 2003). Clearly, this ability to read and write lated behaviors. This strategy contrasts varies in time and across conditions. genomic information has profoundly with a ‘‘casual’’ approach in which direct Most often, genes sharing common accelerated systems biology. interactions among molecules are expression dynamics are inferred to share tracked to glean mechanistic insights. regulators and possibly functional roles, Principles versus Prediction Interestingly, as genetic and biochemical as least at some level (Brown and Botstein, and Correlation versus Causation technologies climb the scaling curves, 1999). The challenge in this area has been This brief historical perspective suggests correlative and causal studies have isolating the set of correlated genes from that discoveries in systems biology may become more intermingled. In other the background of measurement noise be organized within a conceptual space words, as it becomes possible to rapidly and from those genes with merely coinci- (Figure 1). The y axis distinguishes alter any gene (Paddison et al., 2004), dent coexpression. Although clustering between two relatively distinct objectives: modulate any gene’s expression level, techniques have been used for decades deducing principles of network organiza- and perhaps even reorganize large to derive relationships in complex correla- tion necessary for behaviors versus regions of the genome (Gibson et al., tive data sets such as those found in gene reverse engineering networks to predict 2010; Wang et al., 2009; Warner et al., expression compendia, in 2000, Cheng their behavior. Strikingly, with the advent 2010), mechanistic studies will become and Church introduced an algorithm called of scaling biological data, two general available at a genome scale. ‘‘biclustering’’ that explicitly discovers approaches have evolved to meet these Obviously, prediction is not truly antip- ‘‘modules’’ from such data. This method objectives. On one hand, correlative odal to principles, nor is correlation identifies groups of genes, or ‘‘modules,’’
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 845 with similar patterns of expression over function of a single gene is now extended extracellular growth factor and cytokine a specific subset of conditions (Cheng to infer the underlying biochemical inputs (Janes et al., 2005, 2006). The re- and Church, 2000). Individual genes may network (Arkin et al., 1997). In 2001, sulting model successfully predicted the belong to multiple modules, thereby allow- Ideker et al. combined genetic, macromo- level of apoptosis as a function of cytokine ing inference of their numerous functions lecular interactions and expression data inputs and led to the new mechanistic and combinatorial regulation. This impor- (both protein and gene) to infer how the insight that cascades of autocrine sig- tant work inspired an increasing number galactose utilization network in yeast is naling were involved in mediating down- of algorithms concerned with identifying regulated (Ideker et al., 2001). They then stream cell responses to the extracellular related sets of biomolecules from complex used the resulting ‘‘influence network’’ to cues. data and inferring their ‘‘modular’’ func- predict how the system responds to Shortly thereafter, in another landmark tion. These algorithms thereby opened genetic perturbations. Some of these paper, Bonneau et al. (2007) demonstrated the door to discovering an apparent hierar- predictions were validated by experi- how the output of a new gene expression chical modular architecture to cellular ments, yet others were proven incorrect, biclustering algorithm provided input to regulation, which complements the more suggesting that properties of this well- a clever regression algorithm that deci- informal ‘‘pathway’’ organization with characterized regulatory network still phers the transcriptional regulatory which biologists were familiar. The await discovery. network of an Archaea (Halobacterium modules of coherent function also greatly Variants of this approach that applied salinarum NRC-1) and predicts expression simplify construction and interpretation additional, more sophisticated algorithms responses to > 100 conditions (Bonneau of predictive models, as they enabled from multivariate statistics and machine et al., 2007). Recently, such correlative prediction of how different modules, rather learning quickly began to have a strong systems analyses are scaling up to link than the individual constituent genes, are impact on the field. In particular, Harte- biomolecular networks to ecological dynamically deployed—a system formula- mink et al. (2001) offered perhaps the first networks. These pioneering studies are tion that has far fewer variables and thus Bayesian approach for rating different uncovering new scales of biological orga- requires far less data. network structural hypotheses (i.e., nization that should lead to entirely new Gene expression can be an indirect different patterns of molecular interaction) principles of ecosystem function (Zhou measurement of a component’s contribu- against data. Using a collection of 52 et al., 2010). tion to a particular cellular process, and conditions, they demonstrated that it Nevertheless, it is not yet clear how to thus, genetic perturbations and activity was possible to infer the regulatory inter- optimally design perturbation repertoires assays may be required. In seminal actions in the galactose pathway (Harte- to achieve maximum accuracy in anno- work, Giaever et al. (2002) constructed mink et al., 2001). Two years later, Segal tating gene function and regulation and a bar-coded deletion library for the entire et al. (2003) increased the power of these in predictive model inference with minimal genome of Saccharomyces cerevisiae. algorithms to infer the sets of genes expense. Also, it has yet to be proven that This library enabled single-pot assays of (i.e., modules) regulated by particular the models obtained in these types of the relative growth or fitness of each strain transcription factors under specific studies are sufficiently accurate or inex- when exposed to a specific condition conditions. This algorithm also correctly pensive to have an impact in a medical (Giaever et al., 2002). In a subsequent predicted new regulatory roles for less- or industrial setting. Nonetheless, the study, a growth phenotype for nearly characterized proteins (Segal et al., ability to collect such compendia of data, every gene in yeast was identified using 2003). In particular, the model predicted even from diverse types of experiments, 1000 chemical perturbations (Hillenmeyer that one putative transcription factor is rapidly becoming a feasible task for et al., 2008). These types of studies can (Ypl230w) and two signaling molecules even a single laboratory to accomplish. rapidly dissect the cellular targets of (Kin82 and Ppt1) were important for We predict that the increased accessibility drugs and even directly identify specific cellular response to three different condi- to these large-scale data sets will enable transporters involved. In addition, these tions: heat shock, hypo-osmotic shift, the detailed characterization of organisms studies have shown that genes displaying and entry into stationary phase, respec- after their genomes are sequenced and changes in expression under a given tively. Disrupting the genes elicited no may, ultimately, change what it means to condition are not always the genes neces- expression phenotype in rich, unstressed ‘‘complete’’ the genome of an organism. sary for responding functionally to that conditions but strong changes in expres- Uncovering Principles of Network condition (Giaever et al., 2002). Although sion relative to wild-type in the condition Organization the implications of this result are not fully predicted to be relevant for a given gene. The fact that clear functional modules of understood, one obvious conclusion is Applying a different statistical approach gene expression can be inferred from that different types of experiments are called ‘‘Partial Least Squares Regres- correlative data sets implies the existence required to deduce or even predict func- sion,’’ Janes and colleagues undertook of underlying organizational principles for tion of genes. herculean efforts to measure and correlate these networks. Similar hierarchies of Correlative Prediction mammalian cell survival, apoptosis, intra- modules have been found in large-scale of Organization cellular protein phosphorylation states, protein interaction data and metabolic Another type of analysis seeks to infer and kinase activities (thereby generating networks. Certain ‘‘scale-free’’ topologies relationships among gene modules; in a data set with 7980 intracellular measure- of molecular interaction networks have other words, the strategy used to infer ments) in response to combinations of received considerable attention in biology
846 Cell 144, March 18, 2011 ª2011 Elsevier Inc. and other fields. Such topologies, which the epistatic interactions between pairs of As a prime example, Tyson and seem to arise often in both natural and genes in these modules always fell into colleagues (Chen et al., 2004) modeled human designed systems, are character- one of two classes of interactions: buff- the cell-cycle control system of Saccharo- ized by a pattern of interconnectedness ering, in which epitasis diminishes the myces cerevisiae using a set of 35 among the nodes (e.g., proteins) in which individual phenotypic effects of the two ordinary differentiation equations (ODE) the number of interactions per node mutations, or aggravating, in which the representing molecular mechanisms and follows a power law. Influential papers deleterious, individual effects of two mass action (Chen et al., 2004) (for more have suggested that these topologies mutations are worsened by their combi- on modeling the cell cycle, see Primer lead to robustness to perturbation (Jeong nation (Segre` et al., 2005). Modules were by Ferrell et al. on page 874 of this issue). et al., 2000) and in the case of proteins, thus ‘‘monochromatic’’ and never con- The goal of the model was not to account naturally arise due the evolutionary tained mixed type genes, a principle that for the full complexity of the system but process of duplication and divergence was recently verified experimentally (Cos- instead to provide a reasonable approxi- (Rzhetsky and Gomez, 2001). Likewise, tanzo et al., 2010). mation of network behavior and to in developmental biology, it has been These architectural principles uncov- uncover dynamical principles of the archi- argued for decades that for integrated ered from large sets of correlative data tecture. Indeed, their model succeeded in cellular processes to evolve, they must are evocative and well supported, but accounting for a majority of mutant be dissociable into hierarchical, modular the challenge remains to find incontrovert- phenotypes simulated. units that can adapt their behavior with ible evidence for evolutionary selection of Using a similar framework, El-Samad little interference from other such units. these architectures and to fully charac- et al. (2005a, 2005b) modeled the heat Thus, interaction and expression modules terize their functional consequences. shock response in Escherichia coli. may allow rapid, effective rewiring and Mechanistic Approaches to Study Despite the simplicity of the response— tuning of internal dynamics (Price et al., Causal Relationships deploying chaperones to keep proteins 2007; Singh et al., 2008), such that this Although large-scale genomic data sets folded at higher temperature—this model ability to evolve may even be a selectable lend themselves to statistical analysis of uncovered complexity in the modular trait (Earl and Deem, 2004). However, correlation, causal analysis necessitates control structure of the system. It also caution must be taken in assigning evolu- more detailed biochemical data on the demonstrated how the many feedback tionary meaning to apparent modularity networks’ effectors, such as proteins, loops in this system confer the ability to (Lynch, 2007). second messengers, and metabolites. respond quickly and robustly while also On slightly smaller size scales, certain Unfortunately, the experimental analyses trying to minimize the energetic cost of topological motifs—that is, stereotypical of these components have not enjoyed heat shock protein expression (El-Samad small networks of regulatory interactions the same growth in scale as those of et al., 2005a, 2005b). In another important and chemical reactions—may have nucleic acids. That is, whereas volumes study, Yi et al. used dynamical systems important control functions for cellular of data on one-dimensional genomes are control theory to analyze bacterial networks (Rao and Arkin, 2001). The avail- readily available, causal analysis also chemotaxis (Yi et al., 2000), another ability of large-scale data has, in the last requires multidimensional data on biomol- system with well-characterized biochem- decade, enabled the discovery that ecules’ interactions, reactions and their istry. Building on the principle that nega- certain motifs appear more than expected rates, localization, and transport. Mass tive feedback is often central to biological by random chance (Shen-Orr et al., 2002), spectrometry, imaging, genetic sensors, stability (Rosenbleuth et al., 1943), the including feed-forward and feedback chemical probes, and other technologies study found that integral feedback control loops (for more on feed-forward loops, are increasingly providing such data, but underlies the robustness of network see Review by Yosef and Regev on page not yet at the same magnitude as genomic adaptation to significant perturbations in 886 of this issue). These motifs have information. As a result, causal analyses both the amounts and kinetic parameters potential functional importance, such as of cellular networks initially focused on of its component proteins. Interestingly, noise rejection, and appear physiologi- elucidating functional principles but are control engineers ‘‘reinvented’’ this cally robust but also evolutionarily flexible becoming increasingly empowered with strategy and proved that it is required, in with tunable function (Voigt et al., 2005). data to enable prediction. certain conditions, to build robustness Milo et al. (2002) hypothesized that these Uncovering Principles of Function into electrical circuits and other systems. motifs might form a sort of basis set of Large-scale models of biological net- Deterministic representations of dynamic functions from which complex works face the challenges that molecular networks are compromised when their optimized networks could be assembled mechanisms are often complex and constituents are present at low concen- in numerous contexts within and outside nonlinear (e.g., cooperative protein inter- trations or undergo slow reactions. More- of biology (Milo et al., 2002). actions and epigenetic regulation) and over, early studies suggested that noise A beautiful theoretical paper by Segre` many of their inherent parameters are can significantly influence network func- et al. (2005) determined another organiza- unknown (e.g., affinities and rate tion (Arkin et al., 1998). Elowitz et al. tional principle of cellular networks. They constants). However, in some model (2002) explored the principle that fluctua- not only showed that functional modules systems, the biochemistry is sufficiently tions in the quantities and reaction rates of could be inferred from growth phenotypes well characterized to enable the construc- gene expression machinery can cause of double knockout mutants, but also that tion of elegant, large-scale models. noise in gene expression at both a global
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 847 level in a cell (extrinsic), as well as for an whole-cell metabolic model for E. coli,in netic trees, thereby enhancing our under- individual gene (intrinsic) (Elowitz et al., which stoichiometric, thermodynamic, standing of mechanistic features that are 2002). Indeed, subsequent single-mole- and other constraints mathematically necessary for function and evolution. cule imaging studies directly confirmed yielded a solution space of allowed meta- The increasing integration of experimental that both translation (Yu et al., 2006) and bolic network states (Ibarra et al., 2002). and computational technologies will thus transcription (Raj et al., 2006) can underlie This model, which requires fewer parame- corroborate, deepen, and diversify the such noisy protein expression. ters than full dynamical models, can make theories that the earliest systems biolo- The principle that noise is inherent in predictions of network function that gists used logic to infer, thereby inching biological networks raised the question optimize growth under different environ- us ever closer to that central question: of whether its effects on biological fitness mental conditions. Indeed, when Ibarra ‘‘What is Life’’? are neutral, positive, or negative. Although et al. grew E. coli on a new carbon the value of noise depends on the system, substrate, the cells evolved to the meta- ACKNOWLEDGMENTS in certain cases, noise appears to make bolic state predicted by the model. positive contributions to fitness. Organ- In some systems, substantive compar- We would like to thank our colleagues for suggest- isms have a need to adapt to changing ison to data can yield deterministic ing a number of the papers that we reference in this environments, and two adaptation strate- models increasingly capable of predic- work, and we apologize for not being able to gies are sensing and responding to tion. Hoffmann and colleagues (2002) include all of them. The authors would like to acknowledge the National Institutes of Health change or stochastically switching analyzed the mammalian NF-kB system (R01 GM073010-01), and work conducted by phenotype. (Hoffmann et al., 2002), in which activa- ENIGMA was supported by the Office of Science, Two theoretical studies arrived at the tion of this transcription factor upregu- Office of Biological and Environmental Research principle that, under some conditions, lates expression of IkBa, a negative of the US Department of Energy under contract such as when transitions in selective envi- regulator of NF-kB. Integrating experi- number DE-AC02-05CH11231. ronments are slow or cannot be sensed, mental data with a deterministic model stochastic fluctuations in an organism’s enabled prediction of the oscillatory REFERENCES phenotype can increase its fitness (Kus- behavior of this module upon stimulation sell and Leibler, 2005; Wolf et al., 2005). and perturbation. Finally, Schoeberl Acar, M., Mettetal, J.T., and van Oudenaarden, A. In a study that combined experimental et al. (2002) developed a model with 94 (2008). Nat. Genet. 40, 471–475. approaches with simulations, Weinberger ODEs to simulate epidermal growth factor Arkin, A., Ross, J., and McAdams, H.H. (1998). et al. (2005) investigated this principle by signaling through MAP kinase, including Genetics 149, 1633–1648. analyzing stochastic effects in HIV infec- receptor trafficking dynamics and intra- Arkin, A.P., Shen, P.-D., and Ross, J. (1997). tion (Weinberger et al., 2005). Low initial cellular phosphorylation cascades Science 277, 1275. numbers of viral molecules, slow gene (Schoeberl et al., 2002). This is the first Bekey, G.A., and Beneken, J.E.W. (1978). Automa- expression, and amplification by a positive dynamic model of a large cellular tica 14, 41–47. feedback loop lead to very noisy gene signaling network that was carefully Bonneau, R., Facciotti, M.T., Reiss, D.J., Schmid, expression, which for some infections parameterized by prior experimental A.K., Pan, M., Kaur, A., Thorsson, V., Shannon, yielded long delays in gene expression. measurements and that yielded predic- P., Johnson, M.H., Bare, J.C., et al. (2007). Cell This delayed expression contributed to tion on signal transduction dynamics, 131, 1354–1365. the formation of latent HIV, which is clini- which were subsequently validated Brown, P.O., and Botstein, D. (1999). Nat. Genet. cally recognized as the most formidable experimentally. 21(1, Suppl), 33–37. barrier to the elimination of virus from Canon, W. (1939). The Wisdom of the Body (Lon- a patient. The Next Decade don: Norton). In an elegant study, Acar et al. (2008) As systems biology matures, the number Carlson, R. (2003). Biosecurity and Bioterrorism: engineered Saccharomyces cerevisiae of studies linking correlation with causa- Biodefense Strategy, Practice, and Science 1, strains that stochastically switched tion and principles with prediction 203–214. phenotypes at different rates. Interest- continues to grow (Figure 1). Advances Chen, K.C., Calzone, L., Csikasz-Nagy, A., Cross, ingly, they found that the fast-switching in measurement technologies that enable F.R., Novak, B., and Tyson, J.J. (2004). Mol. Biol. Cell 15, 3841–3862. strain outgrew the slow-switching strain large-scale experiments across an array in environments undergoing rapid fluctua- of parameters and conditions will increas- Cheng, Y., and Church, G.M. (2000). Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 93–103. tions, whereas the slow-switching strains ingly meld these correlative and causal were more fit in environments that fluctu- approaches, including correlative anal- Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., Ding, H., Koh, J.L., Tou- ated slowly (Acar et al., 2008). yses leading to mechanistic hypothesis fighi, K., Mostafavi, S., et al. (2010). Science 327, Predictive Analysis of Network testing as well as causal models empow- 425–431. and Cell Function ered with sufficient data to make predic- Earl, D.J., and Deem, M.W. (2004). Proc. Natl. The complexity of molecular mechanisms tions. In addition, the increasing number Acad. Sci. USA 101, 11531–11536. and scarcity of biochemical parameters of organisms sequenced and the El-Samad, H., Khammash, M., Homescu, C., and often makes the development of predic- increasing ease of measurement and Petzold, L. (2005a). Proceedings 16th IFAC World tive models challenging. Ibarra et al. genetic manipulation will enable deep Congress. http://engineering.ucsb.edu/cse/ (2002) created a constraints-based comparison of systems across phyloge- Files/IFACC_HS_OPT04.pdf.
848 Cell 144, March 18, 2011 ª2011 Elsevier Inc. El-Samad, H., Kurata, H., Doyle, J.C., Gross, C.A., Janes, K.A., Albeck, J.G., Gaudet, S., Sorger, P.K., Schoeberl, B., Eichler-Jonsson, C., Gilles, E.D., and Khammash, M. (2005b). Proc. Natl. Acad. Sci. Lauffenburger, D.A., and Yaffe, M.B. (2005). and Mu¨ ller, G. (2002). Nat. Biotechnol. 20, USA 102, 2736–2741. Science 310, 1646–1653. 370–375. Elowitz, M.B., Levine, A.J., Siggia, E.D., and Swain, Janes, K.A., Gaudet, S., Albeck, J.G., Nielsen, Segal, E., Shapira, M., Regev, A., Pe’er, D., Bot- P.S. (2002). Science 297, 1183–1186. U.B., Lauffenburger, D.A., and Sorger, P.K. stein, D., Koller, D., and Friedman, N. (2003). Nat. 124 Genet. 34, 166–176. Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, (2006). Cell , 1225–1239. L., Ve´ ronneau, S., Dow, S., Lucau-Danila, A., Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Segre` , D., Deluna, A., Church, G.M., and Kishony, 37 Anderson, K., Andre´ , B., et al. (2002). Nature 418, Baraba´ si, A.L. (2000). Nature 407, 651–654. R. (2005). Nat. Genet. , 77–83. 387–391. Shen-Orr, S.S., Milo, R., Mangan, S., and Alon, U. Kacser, H., and Burns, J.A. (1973). Symp. Soc. (2002). Nat. Genet. 31, 64–68. Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, Exp. Biol. 27, 65–104. V.N., Chuang, R.-Y., Algire, M.A., Benders, G.A., Singh, A.H., Wolf, D.M., Wang, P., and Arkin, A.P. Kauffman, S.A. (1969). J. Theor. Biol. 22, 437–467. Montague, M.G., Ma, L., Moodie, M.M., et al. (2008). Proc. Natl. Acad. Sci. USA 105, 7500–7505. 329 309 (2010). Science , 52–56. Kussell, E., and Leibler, S. (2005). Science , Voigt, C.A., Wolf, D.M., and Arkin, A.P. (2005). Goodwin, B.C. (1963). (London: Academic Press). 2075–2078. Genetics 169, 1187–1202. 104 Hartemink, A.J., Gifford, D.K., Jaakkola, T.S., and Lynch, M. (2007). Proc. Natl. Acad. Sci. USA Waddington, C.H. (1954). Proceedings of the 9th Suppl 1 Young, R.A. (2001). In Pacific Symposium on Bio- ( ), 8597–8604. International Congress of Genetics 9, 232–245. computing 2001 (PSB01), R. Altman, A.K. Dunker, Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Waddington, C.H. (1977). Tools for thought L. Hunter, K. Lauderdale, and T. Klein, eds. (New Chklovskii, D., and Alon, U. (2002). Science 298, (New York: Basic Books). Jersey:: World Scientific), pp. 422–433. 824–827. Wang, H.H., Isaacs, F.J., Carr, P.A., Sun, Z.Z., Xu, Heinrich, R., and Rapoport, T.A. (1974). Eur. J. Bio- Moore, G.E. (1965). Electronics 38, 114–117. G., Forest, C.R., and Church, G.M. (2009). Nature chem. 42, 89–95. 460, 894–898. Paddison, P.J., Silva, J.M., Conklin, D.S., Schla- Hillenmeyer, M.E., Fung, E., Wildenhain, J., Pierce, bach, M., Li, M., Aruleba, S., Balija, V., O’Shaugh- Warner, J.R., Reeder, P.J., Karimpour-Fard, A., S.E., Hoon, S., Lee, W., Proctor, M., St Onge, R.P., nessy, A., Gnoj, L., Scobie, K., et al. (2004). Nature Woodruff, L.B., and Gill, R.T. (2010). Nat. Biotech- 320 28 Tyers, M., Koller, D., et al. (2008). Science , 428, 427–431. nol. , 856–862. 362–365. Weinberger, L.S., Burnett, J.C., Toettcher, J.E., Price, M.N., Dehal, P.S., and Arkin, A.P. (2007). Hoffmann, A., Levchenko, A., Scott, M.L., and Bal- Arkin, A.P., and Schaffer, D.V. (2005). Cell 122, PLoS Comput. Biol. 3, 1739–1750. timore, D. (2002). Science 298, 1241–1245. 169–182. Raj, A., Peskin, C.S., Tranchina, D., Vargas, D.Y., Ibarra, R.U., Edwards, J.S., and Palsson, B.O. Wolf, D.M., Vazirani, V.V., and Arkin, A.P. (2005). J. and Tyagi, S. (2006). PLoS Biol. 4, e309. (2002). Nature 420, 186–189. Theor. Biol. 234, 227–253. Rao, C.V., and Arkin, A.P. (2001). Annu. Rev. Bio- Ideker, T., Thorsson, V., Ranish, J.A., Christmas, Yi, T.-M., Huang, Y., Simon, M.I., and Doyle, J. med. Eng. 3, 391–419. R., Buhler, J., Eng, J.K., Bumgarner, R., Goodlett, (2000). Proc. Natl. Acad. Sci. USA 97, 4649–4653. D.R., Aebersold, R., and Hood, L. (2001). Science Rosenbleuth, A., Wiener, N., and Bigelow, J. Yu, J., Xiao, J., Ren, X., Lao, K., and Xie, X.S. 292, 929–934. (1943). Philos. Sci. 10, 18–43. (2006). Science 311, 1600–1603. Jacob, F., and Monod, J. (1962). Cold Spring Harb. Rzhetsky, A., and Gomez, S.M. (2001). Bioinfor- Zhou, J., Deng, Y., Luo, F., He, Z., Tu, Q., and Zhi, Symp. Quant. Biol. 26, 193–211. matics 17, 988–996. X. (2010). MBio. 1, e00169-10.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 849 Leading Edge Essay
The Cell in an Era of Systems Biology
Paul Nurse1,2,* and Jacqueline Hayles1 1Cancer Research UK, London Research Institute, 44, Lincoln’s Inn Fields, London UK WC2A 3LY, UK 2The Rockefeller University, 1230 York Avenue, New York, NY 10021-6399, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.02.045
The increasing use of high-throughput technologies and computational modeling is revealing new levels of biological function and organization. How are these features of systems biology influ- encing our view of the cell?
It is difficult to forecast the impact of networks (Langston et al., 2010), and like- increasingly those interested in systems systems biology on our understanding of wise some insights in cell biology may not approaches in cell biology are considering the cell, an issue not made any easier by arise from strictly molecular explanations. ecological and evolutionary perspectives the fact that there is as yet no firm consen- (Ezov et al., 2006; Liti et al., 2009). Ecology sus as to what is meant by ‘‘systems Biological Function is relevant to the relationships of a cell with biology,’’ although as our colleague Marc and Organization other cells and with its physical environ- Kirschner has said, ‘‘we all seem to know One approach to systems biology has ment and applies to both free-living it when we see it.’’ And in that spirit we been to emphasize the overall biological single-celled organisms such as the will discuss here the various attributes functions expressed at different levels of yeasts and Protozoa and to cells within and methodological approaches usually biological organization, such as the tissues. Ecological and evolutionary associated with systems biology, how organelle, the cell, the tissue, the organ, perspectives can help to understand they have been applied to cell biology, and the organism. The level of the cell how a cell has come to function as it and how they may be developed to attain occupies a particularly important position does and to improve awareness of the a better understanding of how cells work. (Brenner, 2010; Nurse, 2008) as it is the selective pressures operating on a cell simplest unit exhibiting the characteristics (Ding et al., 2010; Shah et al., 2009). Reductionism and Holism of life, so understanding biological func- Improved contacts between the ecolog- Discussions of systems biology often tion at the level of the cell brings us closer ical and evolutionary communities and make a distinction between holistic and to a better appreciation of the nature of cell biologists will enhance these studies. reductionist approaches. Our view is that life. The differing levels or units of organi- scientific explanations and methodolo- zation from organelle to organism often Ensemble Descriptions gies are essentially reductionist in nature. exhibit teleonomic, that is, apparently An approach often associated with However, although it is difficult to imagine purposeful behaviors (Monod, 1972). systems biology is the generation of a scientific enquiry or explanation that is Examples of purposeful behavior include ensemble descriptions, that is, the collec- not reductionist, it is important to keep homeostasis and the maintenance of tion of data describing the behavior of a focus on the behavior of whole systems organizational integrity, the generation of large numbers of components. This has in biology and to understand how the spatial and temporal order, communica- been made possible by increasingly interactions and processes brought about tion within and between the units of orga- sophisticated technologies and analytical by component parts acting at lower levels nization, and the reproduction of those procedures, which have led to massively in a system are constrained by overall units. The objective of this approach is parallel collections of different types of functions acting at higher levels. to understand how teleonomic behaviors data and the establishment of consortia Sometimes those of a more holistic are generated at the different units of such as ENCODE (http://encodeproject. persuasion object to the dominance of organization, usually in terms of mole- org/) and databases such as the Saccha- molecular explanations in cell biology, cules and of interactions between mole- romyces Genome Database (SGD) and but the fact is that most useful explana- cules. This view of systems biology the fission yeast database (Pombase). tions in cell biology have to be in terms of stresses overall biological function of the The canonical ensemble approach has molecules because molecules are the relevant biological unit and is an approach been whole-genome sequencing, which most relevant lower-level component encompassed by a number of traditional has allowed the description and compar- into which to decompose the function biological disciplines including physiology ison of gene contents for a wide range of and organization of the cell. However, and forward genetics. organisms, facilitating molecular genetic not all explanations in biology are An interest in the overall biological func- analysis of biological mechanisms far molecular, for example developmental tions of a living organism naturally leads to beyond the limited numbers of genetically processes may be explained in terms of consideration of the influence of ecology amenable model organisms. cell behavior (Towers and Tickle, 2009) and evolution on how that organism Genome sequencing is particularly and neurobiology by the action of neural works. This also applies to cells, and useful for cell biology because all living
850 Cell 144, March 18, 2011 ª2011 Elsevier Inc. organisms are composed of cells, and so universally in a particular cellular phenom- may therefore have interesting regulatory orthologous genes important for cellular enon. For example, a comparative roles. phenomena can be studied in a variety approach has enabled the identification For many researchers, the creation of of organisms. Cells in different organisms of RNAs whose levels change at transi- interaction networks is a major goal of or in different tissues of the same tions through specific cell-cycle stages in systems cell biology that is aimed at organism allow orthologous genes and a conserved manner in more than one providing complete networks of different related cellular phenomena to be investi- organism (Rustici et al., 2004). cellular processes. However, achieving gated in a range of situations yielding this aim may require more sophisticated informative comparisons. A good ex- Networks languages or notations to fully describe ample has been the comparison of cell- The availability of ensemble datasets also how the networks work. Unlike simple cycle control in yeast cells with metazoan allows the systematic grouping of genes networks, such as an airline transporta- embryos (Gould and Nurse, 1989; Murray with related functions. For example, tion network, the interaction linkages in and Kirschner, 1989). Knowledge of catalogs of genes that when deleted biological networks may represent stable whole-genome sequences also allows have a similar cellular phenotype will iden- complexes or transient catalytic reactions gene ablation experiments to be carried tify gene sets required for particular or may reflect the logical nature of the out on a genome-wide basis. Two major processes. Similarly, RNA transcripts interaction, for example a representation methodologies have been used, system- that behave in a similar manner, such as of a negative or positive feedback. The atic gene deletions and libraries of small- peaking in level at a particular phase of notation used in network descriptions interfering RNAs (siRNAs). Other method- the cell cycle, reveal RNAs that potentially needs to reflect this complexity. It is also ologies such as transposition have been have related roles. In this way, the ‘‘tool- important to take account of the fact that used, particularly in prokaryotes (Zhang kit’’ required for a specific cellular process the linkages are not always hard-wired and Lin, 2009). To date, whole-genome can be assembled. Another grouping because they are mostly based on chem- gene deletions have only been completed approach is to construct networks based istry with connections established by in bacteria and yeasts (de Berardinis et al., on gene products that interact with each chemicals diffusing from one component 2008; Giaever et al., 2002; Kim et al., 2010) other. Such networks can be assembled to another. These chemical linkages can and have the advantage of completely using interaction trap methodologies readily break and reform to connect ablating a gene function, making func- (such as two-hybid methodologies different components and remodel the tional assignments and comparisons of and immunoprecipitations) that assess architecture of the network (Bray, 2009). gene functions between organisms more whether molecules are in physical straightforward. siRNA libraries are very contact. Also important are catalytic inter- Quantitative Methodologies versatile as they can be employed in actions resulting in metabolic changes or Quantitative methodologies involving many organisms but can be subject to chemical modifications, such as phos- both large datasets and the modeling of partial knockdown and off-target effects phorylation. Biochemical approaches data are frequently used in systems (Sioud, 2011). Successes using these can be complemented by high- approaches. The massively parallel approaches include the identification of throughput genetic interaction assays collections of data as generated by micro- all genes required for the viability of (screening, for example, for synthetic arrays, for example, have superseded the budding and fission yeast cells, for cellular lethality), although these do not neces- more qualitative measurements of tradi- processes such as centromeric cohesion sarily provide evidence for direct physical tional molecular biology with techniques in budding yeast (Marston et al., 2004), interaction between components. Green such as northern and western blotting. and for mitosis in human cells (Kittler fluorescent protein tags can be used An advantage of good quantification is et al., 2007; Neumann et al., 2010). to identify molecular components that that it leads to a better appreciation of Ensemble descriptions have been used spatially colocalize, as an indicator of the effect of the number of molecules extensively, including microarrays for potential functional relationships (Huh within a cell on biological processes. monitoring the types and levels of RNAs, et al., 2003; Matsuyama et al., 2006). This allows an assessment of the stoi- mass spectroscopy for studying proteins, These various methodologies allow chiometry between different molecular and mass spectroscopy and chromato- networks to be built up that connect components as well as recognition that graphy for assessing metabolites. En- molecular components throughout the there may be only a few molecules of semble data collections have the ad- cell to generate an overall cellular interac- a particular type present within a cell. vantage of avoiding the dangers of tome (Collins et al., 2007; Rual et al., Some gene transcripts in yeast are inadvertently ‘‘cherry-picking’’ data when 2005). The power of these networks is present at an average of less than one studies are confined to work on limited enhanced when they are combined with per cell (Velculescu et al., 1997), shifting numbers of gene products, which can catalogs of genes involved in a particular our view of regulation from being driven result in investing too much importance cellular function because they lead to by mass action, which is analog in char- to a particular RNA or protein simply a better molecular understanding of the acter, to one that is more stochastic and because it is the only one under investiga- process of interest. Interaction networks digital. This means that greater attention tion. Comparisons between different cells can also identify linker components that is needed on the influence of molecular and organisms allow the identification of connect different functional networks noise on the cell (Newman et al., 2006). gene products that are implicated more and processes (Zhong et al., 2009) and An important issue is whether noise and
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 851 the variability it generates between cells tional pathways, and these additions thought to have been important in CDK are exploited for regulatory purposes, increase redundancy. In this respect regulation and mitotic control can be elim- for example to ensure a range of cellular modeling in biology may differ from inated while still maintaining good size responses to environmental changes physics where the aesthetic is to search control over mitotic onset. This simplified such as the competence state of Bacillus for the simplest and most elegant model network focuses attention on those subtilis (Maamar et al., 2007; Suel et al., to explain a phenomenon. In biology, there elements that are sufficient to generate 2006). Monitoring of single cell behaviors are often more elements in a model than good mitotic control and cell size homeo- has revealed that there is a significant are strictly necessary and some act redun- stasis, reducing the degrees of freedom variation between cells that was not dantly. The number of elements also and making modeling more straightfor- appreciated previously by global popula- increases the degrees of freedom avail- ward. In a way, this is synthetic biology tion analyses (Choi and Kim, 2009). The able, reducing confidence in the outcome in reverse; rather than building a simple combined use of photomicrography and of the modeling process. network de novo, such as an oscillator robotic microscopes is capable of gener- There are several ways these difficulties or clock (Danino et al., 2010; Elowitz and ating large amounts of data to investigate can be addressed. One way used by Leibler, 2000), a pre-existing network these effects of noise and stochastic modelers is to test the sensitivity of is simplified. Both approaches lead to behaviors in cells, for example cell size models to make sure that they still work the same outcome—the generation of variability at the G1-S transition in well when the parameters used in the simpler models still capable of explaining budding yeast (Di Talia et al., 2007). equations are varied. If the model still the biological function of interest. Biochemical processes are generally behaves robustly when different values modeled by deriving differential equations are used in the equations, then confi- Managing Information to calculate flux through pathways using dence in the model is increased. It is Networks and quantitative modeling are in vivo estimates of the rate constants also helpful if the biological function being closely associated with information and concentrations of components. studied can be recapitulated in vitro. management within the cell. Many of the Although modeling in cell biology has Many quite complex processes can be most insightful explanations in cell biology become more popular in recent years, in carried out in concentrated Xenopus egg have been made in terms of information part due to the massive increase in data and cell extracts, for example important flow; this involves understanding how available and to the migration of more aspects of cell-cycle control (Blow and the cell gathers, processes, stores, and theoretically inclined scientists to biology, Laskey, 1986; Deibler and Kirschner, uses information in the context of a biolog- in the past it was only pursued by a few 2010). In Xenopus extracts, for instance, ical function or phenomenon of interest committed individuals (Novak and Tyson, the levels of biochemical components (Nurse, 2008). Information is gathered 1997; Tyson, 1983). The evolutionary bio- can be both measured and manipulated from both outside and inside the cell and logist John Maynard Smith contended more easily than is possible in a living cell. is processed and communicated to that the act of thinking about a model’s Fluorochrome-based sensor modules different parts of the cell. Storage of infor- equations greatly clarifies understanding combined with light microscopy are also mation occurs over a wide range of time- of how the model works. Biologists have providing better ways of measuring scales, from the long timescale seen in a tendency to produce somewhat loosely concentrations within cells in vivo, such heredity (encoded in the DNA sequence formulated models summarized in the as protein levels in budding yeast cells and possibly mediated by epigenetics), form of cartoons, and it is useful to subject (Newman et al., 2006). Another approach through to the medium timescale seen these to the discipline of writing equations is to simplify the biochemical network with mRNA and gene transcriptional in the expectation that the thought underlying the biological function of circuits, to the short timescale seen with imposed by equation writing will improve interest, although this is only useful if the activated small G proteins (Bonasio understanding of the model’s assump- essential elements of that process are still et al., 2010; Etienne-Manneville and Hall, tions and dynamics. maintained. The advantage of simplifica- 2002; Roy et al., 2010). Information is However, two major problems are often tion is that it reduces the degrees of used to direct cell behaviors, coordinating encountered when generating mathemat- freedom available, making modeling appropriate responses to changing ical models for cell biology: the complexity easier and the outcome more reliable. circumstances. Recognition of the signifi- of the pathways being modeled and the An example of the potential for network cance of information was crucial at the difficulty of estimating the appropriate simplification is seen with a recent genetic beginning of molecular biology, particu- values for rate constants and the concen- manipulation of the mitotic control larly in dealing with how information tration of components. Biochemical path- network in fission yeast (Coudreuse and flowed from gene to protein, although it ways are often complex with many redun- Nurse, 2010). Many gene products have applies to all aspects of cell behavior. dant functions, reflecting the fact that been identified that regulate the cyclin- The iconic examples from that time are evolution does not always lead to, from dependant kinases (CDKs), and several the concepts that DNA acts as a digital an engineer’s point of view, the most effi- quantitative models have been generated information storage device (Brenner cient and economic solutions (Jacob, that can explain how CDKs are controlled et al., 1961) and that the lac operon regu- 1977; Saunders and Ho, 1976). Natural to ensure orderly progression through the latory circuit forms a negative feedback selection acts on pre-existing cells often cell cycle at the correct cell size. Unex- loop (Dickson et al., 1975; Lin and Riggs, by making additions to previously opera- pectedly, a number of the gene products 1975; Ohki and Sato, 1975). Systems
852 Cell 144, March 18, 2011 ª2011 Elsevier Inc. biology, by generating datasets, net- a series of dots and dashes. Information functions and to group them into the works, and models, provides an opportu- is also managed in the three dimensions networks responsible for the process. nity to understand information flow of cellular space (Scott and Pawson, Genetics can be used to simplify the through the cell. In our view, this is one 2009). Not only must spatial information network, focusing attention on the core of the most important aspects of systems be generated to define the space of the gene functions responsible for the process analyses in cell biology and will help move cell but the availability of various cellular to help with subsequent modeling. studies from descriptions of biological compartments means that different Comparisons with cells in other organisms phenomena to a better understanding of information can be stored in different will test whether the conclusions being how they work. places and a wide variety of connections reached can be generalized across Information management involves between logic modules can be formed species including human cells and also various processing elements or logic and reformed through diffusible chemi- allow in vitro systems to be developed modules that carry out particular compu- cals. The richness of behavior possible especially with Xenopus egg extracts. tational functions, which can be catego- with this arrangement is reminiscent of A major aim with the initiative will be to rized according to the type of function the complex behaviors normally associ- explain as often as possible a cell biolog- they carry out. For example, a negative ated with neural networks. ical function or process in terms of feedback loop communicates information information management. This requires from a late step in a pathway to an earlier A Cell Biology Systems Initiative interdisciplinary approaches and is not step, and if there is increased flow at the The cell is the simplest unit that exhibits so straightforward because cell biology later step, then a negative signal is sent the characteristics of life and so is likely experiments generally yield biochemical to the earlier one, reducing overall flow to be the most effective level in biology results, and there are no easy ways to through the pathway and thus maintaining to investigate how life works. The tools translate chemistry into the information homeostasis. In contrast, a positive feed- and intellectual framework of systems processing elements or logic modules back loop sends a positive signal that biology will provide great opportunities that we have argued are needed for increases overall flow to generate a switch to achieve this objective by generating good understanding. It would be helpful to maximum flow through the pathway. the data needed and the approaches if there were more effective ways to model More complex logic modules produce required for a comprehensive under- pathways and networks without having to more sophisticated responses, such as standing of the cell. This applies to all know all the rate constants and concen- toggles switching between two states, types of cells including bacteria, which trations involved, and we have previously timers measuring elapsed time, oscilla- can have small genomes and where there outlined possible procedures that may tors cycling in time, and gradients have been great advances in recent years help with that elsewhere (Nurse, 2008). measuring cellular dimensions (Tyson (Wang et al., 2010). But it is with eukary- Despite these difficulties, we are now et al., 2003). The operation of these otic cells where the greatest benefits are well placed to apply the methods of modules depends on how the various likely to be realized because already systems biology more comprehensively components are linked together and the much work has been achieved and the to cell biology to gain greater insight into shapes of the response curves that deter- conservation of many processes across how cells work. mine the character of those interactions. eukaryotes means that different cell types There is a need to build on past work to with differing characteristics and ACKNOWLEDGMENTS construct a full listing of the different types strengths in methodologies can be used of logic modules that are operational in to study the same biological phenomena. We would like to thank our colleagues at The Rock- cells. Working with engineers and cyber- It’s perhaps not surprising that we, as efeller University and CRUK London Research neticists should be helpful in achieving two yeast geneticists, would recommend Institute, particularly L. Weston and J. Wu, for help- ful comments on the manuscript. We would also this goal (Alon, 2003; Nurse, 2008). the unicellular budding and fission yeasts like to acknowledge the many researchers whose An emphasis on information manage- as good models for studying many work we have not been able to reference because ment may reveal some unexpected aspects of cell biology using a systems of space constraints. features of cells. An example is the poten- approach. Both organisms are eukary- tial for dynamics to enrich information otes with small genomes of only 5000– REFERENCES transfer through signaling pathways. 6000 genes, making systems genomic Such pathways are usually thought of as analyses more straightforward to carry Alon, U. (2003). Science 301, 1866–1867. on/off switches that can only be in one out. The availability of genome-wide Blow, J.J., and Laskey, R.A. (1986). Cell 47, of two states. However, if signals are gene deletion collections together with 577–587. pulsed down the pathway and the output other methods for saturation forward Bonasio, R., Tu, S., and Reinberg, D. (2010). depends on the dynamics of those genetics (Guo and Levin, 2010) allows Science 330, 612–616. pulses, then more information can be the identification of nearly all the genes Bray, D. (2009). Wetware: A Computer in Every communicated (von Kriegsheim et al., in the genome that are involved in a partic- Living Cell (New Haven, CT: Yale University Press). 2009). This is the same idea that forms ular cellular function or process. Applica- Brenner, S. (2010). Philos. Trans. R. Soc. Lond. B the basis for Morse code, a system that tion of interaction trap procedures Biol. Sci. 365, 207–212. communicates complex messages by together with bioinformatics will help to Brenner, S., Jacob, F., and Meselson, M. (1961). utilizing the dynamics produced by identify the biochemical roles of gene Nature 190, 576–581.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 853 Choi, J.K., and Kim, Y.J. (2009). Nat. Genet. 41, Kim, D.U., Hayles, J., Kim, D., Wood, V., Park, Rual, J.F., Venkatesan, K., Hao, T., Hirozane-Kish- 498–503. H.O., Won, M., Yoo, H.S., Duhig, T., Nam, M., ikawa, T., Dricot, A., Li, N., Berriz, G.F., Gibbons, Collins, S.R., Miller, K.M., Maas, N.L., Roguev, A., Palmer, G., et al. (2010). Nat. Biotechnol. 28, F.D., Dreze, M., Ayivi-Guedehoussou, N., et al. Fillingham, J., Chu, C.S., Schuldiner, M., Gebbia, 617–623. (2005). Nature 437, 1173–1178. M., Recht, J., Shales, M., et al. (2007). Nature Kittler, R., Pelletier, L., Heninger, A.K., Slabicki, M., Rustici, G., Mata, J., Kivinen, K., Lio, P., Penkett, 446 , 806–810. Theis, M., Miroslaw, L., Poser, I., Lawo, S., Grab- C.J., Burns, G., Hayles, J., Brazma, A., Nurse, P., 9 Coudreuse, D., and Nurse, P. (2010). Nature 468, ner, H., Kozak, K., et al. (2007). Nat. Cell Biol. , and Bahler, J. (2004). Nat. Genet. 36, 809–817. 1074–1079. 1401–1412. Saunders, P.T., and Ho, M.W. (1976). J. Theor. Langston, R.F., Ainge, J.A., Couey, J.J., Canto, Danino, T., Mondragon-Palomino, O., Tsimring, L., Biol. 63, 375–384. and Hasty, J. (2010). Nature 463, 326–330. C.B., Bjerknes, T.L., Witter, M.P., Moser, E.I., and Moser, M.B. (2010). Science 328, 1576–1580. Scott, J.D., and Pawson, T. (2009). Science 326, de Berardinis, V., Vallenet, D., Castelli, V., Besnard, 1220–1224. M., Pinet, A., Cruaud, C., Samair, S., Lechaplais, Lin, S., and Riggs, A.D. (1975). Cell 4, 107–111. C., Gyapay, G., Richez, C., et al. (2008). Mol. Liti, G., Carter, D.M., Moses, A.M., Warringer, J., Shah, S.P., Morin, R.D., Khattra, J., Prentice, L., Syst. Biol. 4, 174. Parts, L., James, S.A., Davey, R.P., Roberts, I.N., Pugh, T., Burleigh, A., Delaney, A., Gelmon, K., 461 Deibler, R.W., and Kirschner, M.W. (2010). Mol. Burt, A., Koufopanou, V., et al. (2009). Nature Guliany, R., Senz, J., et al. (2009). Nature , Cell 37, 753–767. 458, 337–341. 809–813. Di Talia, S., Skotheim, J.M., Bean, J.M., Siggia, Maamar, H., Raj, A., and Dubnau, D. (2007). Sioud, M. (2011). Methods Mol. Biol. 703, 173–187. E.D., and Cross, F.R. (2007). Nature 448, 947–951. Science 317, 526–529. Suel, G.M., Garcia-Ojalvo, J., Liberman, L.M., and Marston, A.L., Tham, W.H., Shah, H., and Amon, A. Dickson, R.C., Abelson, J., Barnes, W.M., and Elowitz, M.B. (2006). Nature 440, 545–550. Reznikoff, W.S. (1975). Science 187, 27–35. (2004). Science 303, 1367–1370. Towers, M., and Tickle, C. (2009). Int. J. Dev. Biol. Ding, L., Ellis, M.J., Li, S., Larson, D.E., Chen, K., Matsuyama, A., Arai, R., Yashiroda, Y., Shirai, A., 53, 805–812. Wallis, J.W., Harris, C.C., McLellan, M.D., Fulton, Kamata, A., Sekido, S., Kobayashi, Y., Hashimoto, R.S., Fulton, L.L., et al. (2010). Nature 464, A., Hamamoto, M., Hiraoka, Y., et al. (2006). Nat. Tyson, J.J. (1983). J. Theor. Biol. 104, 617–631. 999–1005. Biotechnol. 24, 841–847. Tyson, J.J., Chen, K.C., and Novak, B. (2003). Curr. Elowitz, M.B., and Leibler, S. (2000). Nature 403, Monod, J. (1972). Chance and Necessity: An Essay Opin. Cell Biol. 15, 221–231. 335–338. on the Natural Philosophy of Modern Biology (New Velculescu, V.E., Zhang, L., Zhou, W., Vogelstein, Etienne-Manneville, S., and Hall, A. (2002). Nature York: Vintage Books). J., Basrai, M.A., Bassett, D.E., Jr., Hieter, P., 420, 629–635. Murray, A.W., and Kirschner, M.W. (1989). Nature Vogelstein, B., and Kinzler, K.W. (1997). Cell 88, 339, 275–280. Ezov, T.K., Boger-Nadjar, E., Frenkel, Z., Katsper- 243–251. ovski, I., Kemeny, S., Nevo, E., Korol, A., and Neumann, B., Walter, T., Heriche, J.K., Bulkescher, Kashi, Y. (2006). Genetics 174, 1455–1468. J., Erfle, H., Conrad, C., Rogers, P., Poser, I., Held, von Kriegsheim, A., Baiocchi, D., Birtwistle, M., 464 Sumpton, D., Bienvenut, W., Morrice, N., Yamada, Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, M., Liebel, U., et al. (2010). Nature , 721–727. K., Lamond, A., Kalna, G., Orton, R., et al. (2009). L., Veronneau, S., Dow, S., Lucau-Danila, A., Newman, J.R., Ghaemmaghami, S., Ihmels, J., Nat. Cell Biol. 11, 1458–1464. Anderson, K., Andre, B., et al. (2002). Nature 418, Breslow, D.K., Noble, M., DeRisi, J.L., and Weiss- 387–391. man, J.S. (2006). Nature 441, 840–846. Wang, Y., Cui, T., Zhang, C., Yang, M., Huang, Y., Gould, K.L., and Nurse, P. (1989). Nature 342, Novak, B., and Tyson, J.J. (1997). Proc. Natl. Acad. Li, W., Zhang, L., Gao, C., He, Y., Li, Y., et al. 9 39–45. Sci. USA 94, 9147–9152. (2010). J. Proteome Res. , 6665–6677. Guo, Y., and Levin, H.L. (2010). Genome Res. 20, Nurse, P. (2008). Nature 454, 424–426. Zhang, R., and Lin, Y. (2009). Nucleic Acids Res. 37 239–248. Ohki, M., and Sato, S. (1975). Nature 253, 654–656. , D455–D458. Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Roy, S., Ernst, J., Kharchenko, P.V., Kheradpour, Zhong, Q., Simonis, N., Li, Q.R., Charloteaux, B., Howson, R.W., Weissman, J.S., and O’Shea, E.K. P., Negre, N., Eaton, M.L., Landolin, J.M., Bristow, Heuze, F., Klitgord, N., Tam, S., Yu, H., Venkate- (2003). Nature 425, 686–691. C.A., Ma, L., Lin, M.F., et al. (2010). Science 330, san, K., Mou, D., et al. (2009). Mol. Syst. Biol. Jacob, F. (1977). Science 196, 1161–1166. 1787–1797. 5, 321.
854 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Leading Edge Essay
Informing Biological Design by Integration of Systems and Synthetic Biology
Christina D. Smolke1,* and Pamela A. Silver2,* 1Department of Bioengineering, Stanford University, 473 Via Ortega, Stanford, CA 94305-4201, USA 2Department of Systems Biology and Wyss Institute of Biologically Inspired Engineering, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA *Correspondence: [email protected] (C.D.S.), [email protected] (P.A.S.) DOI 10.1016/j.cell.2011.02.020
Synthetic biology aims to make the engineering of biology faster and more predictable. In contrast, systems biology focuses on the interaction of myriad components and how these give rise to the dynamic and complex behavior of biological systems. Here, we examine the synergies between these two fields.
Biology is the technology of this century. apply this approach. Instead, imagine organism. However, we now know that The potential uses of biology to improve a time when a bioengineer designs a sys- genes display more fine-grained modu- the human condition and the future tem at the computer, orders the neces- larity in the form of promoters, open of the planet are myriad. Over the last sary DNA encoding the specified system, reading frames (ORFs), and regulatory century, humans have used biology to and then begins the actual experiment of elements. mRNAs contain sequences make many useful things, in part based turning it into life. Thus, one overarching important for proper intracellular targeting on discoveries from molecular biology. goal of synthetic biology is to make the and degradation. Proteins often contain In addition, researchers have redesigned engineering of biology faster, affordable, targeting sequences, reactive centers, biological systems to test our funda- and more predictable. and degradation sequences. And lastly, mental understanding of their compo- Biological systems and their underlying entire pathways are modular in that nents and integrated functions. However, components offer a number of functional some signaling pathways can be trans- the complexity and reliability of engi- parallels with engineered systems. For ferred from one organism to another to neered biological systems still cannot example, biological sensors are exqui- reconstruct a new state in the engineered approach the diversity and richness ex- sitely sensitive; the olfactory system can organism. This modularity underlies one hibited by their natural counterparts. It is detect single odorant molecules and of the core concepts of synthetic bio- then the combined promise of systems decode them. Biological systems can logy—the notion that one can assemble biology and synthetic biology that may send and receive signals rapidly and in a biological systems from well-defined drive transformative advances in our highly specific manner. Pathways exist ‘‘parts’’ or modules (Endy, 2005). How- ability to program biological function. to sense and respond to the environment. ever, modular assembly approaches One recent example of the successful Plants and microbes can use sunlight as have largely remained confounded by engineering of a biological system to an energy source. However, biological the effects of context—that is, the non- address a global challenge in health and systems are also uniquely capable of modular aspects of biology. For example, medicine is the creation of microbes that self-replication, mutation, and selection, where a gene or an associated regulatory produce a precursor to the antimalarial leading to evolution. Synthetic biologists element is located in the genome can drug artemisinin (Ro et al., 2006). By shift- aim to take advantage of these parallels impact expression and thus its function. ing synthesis from the natural production and develop engineering principles for In addition, the location of regulatory host (a plant) to one more optimized for the design and construction of biological elements relative to each other and rapid production times and inexpensive systems. However, an open question is ORFs can impact their encoded function scale up (a microorganism), researchers whether we understand biological sys- (Haynes and Silver, 2009). Further anal- were able to develop a process that en- tems sufficiently to be able to redesign yses provided by systems biology may abled cheaper supply of this drug, pro- them to fulfill specific requirements. help to guide the development of stan- viding a more accessible cure for a Engineers enjoy the concept of inter- dard strategies for assembling genetic disease devastating third world countries. changeable parts and modularity. Biology modules into functional units. However, the research phase of this offers many sources of potential modu- project required an investment of over larity but exhibits nonmodular features Approaches to Synthetic Biology $25 million and 150 person-years of highly as well. For many years the gene was Given that the goals of synthetic biology trained researcher effort. This investment regarded as a fundamental modular unit are to make the engineering of biology cannot realistically be replicated for every of biology. As such, a gene is capable of faster and more predictable, and to chemical or material to which we would transferring a particular phenotype to the harness the power of biology for the
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 855 common good, the development of new as RNAs and proteins, to perform new approach of studying components in approaches that support the design and functions. Substantial efforts in the field isolation from each other (Alon, 2007). construction of genetic systems has of protein engineering have contributed Systems biology has been associated been a core activity within the field. Al- to the diversity of functions exhibited with new technologies and methods that though advances have been made in by protein components (Dougherty and allow for quantitative measures of com- both areas (fabrication and design), our Arnold, 2009). However, even with these ponents and component interactions ability to construct large genetic systems advances, the diversity of component within biological systems, particularly currently surpasses our ability to design activities that is currently available as those that allow for genome-wide mea- such systems, resulting in a growing parts has been limited, thus limiting surements. In addition, because many of ‘‘design gap’’ that is a critical issue that the design of genetic circuits. Systems these technologies result in large data- synthetic biologists must address. biology may aid in the development of sets, systems biology has also been The ability to synthesize large pieces of effective strategies for generating new associated with computational tools that DNA corresponding to operons, entire component functions by providing infor- support the integration and analysis of pathways, chromosomes, and genomes mation on how Nature has evolved these datasets to identify static relation- in a rapid and predictable way is a key different functions for macromolecules. ships and interactions between compo- approach to system fabrication. Systems A third approach is the predictable nents. Finally, as one of the ultimate goals biology has provided numerous tem- design of complex genetic circuits that of systems biology is to be able to predict plates with the abundance of sequenced lay the foundation for new biological a system’s dynamic behavior from the genomes being deposited daily into devices and systems. Many circuits component parts, computational tools publicly accessible databases. Some designed and built thus far have relied that can model biological systems-level progress has recently been reported, on our fairly detailed knowledge of how function from underlying components including the resynthesis of a bacterial gene transcription is regulated. For are associated with this field. genome and its successful insertion into example, synthetic circuits have applied However, there are currently a number a different bacterial host (Gibson et al., concepts of positive and negative feed- of challenges and limitations facing the 2010). However, it took researchers back to generate systems that sense field of systems biology. Paramount is nearly 15 years and approximately $30 stimuli, remember past events, and pro- determining how to correctly analyze million to develop various fundamental mote cell death in both prokaryotic and and draw valid conclusions from large aspects of this project. Much of this time eukaryotic cells (Burrill and Silver, 2010; amounts of different types of data ranging and cost was methods development that Sprinzak and Elowitz, 2005). However, from genomics and metabolomics to will hopefully reduce the resources many of these systems have been built molecular dynamics in many single cells. needed to carry out such projects in the in a fairly ad hoc manner, requiring sub- Effectively addressing this problem may future. In addition, new high-throughput stantial troubleshooting and iterative require new mathematical and computer methods for large-scale DNA synthesis design to exhibit desired functions, and science approaches. A second key chal- have been recently described (Matzas lack the robust performance standards lenge is knowing what kind of measure- et al., 2010; Norville et al., 2010; Tian one might expect as an engineer. Going ments to make and how accurate these et al., 2009). However, much more work forward, synthetic biologists need to measurements need to be to fully under- is still needed to develop these technolo- better understand the parts underlying stand a biological system. Effectively gies to the point where they are acces- system design, how to predict their func- addressing this challenge will require a sible to the majority of researchers (that tion in a particular genetic context, and re-evaluation of how measurements is, in terms of cost and reliability), and how to predict their integrated function have been made over the past 10 years systems biology may provide important with other system parts (Ellis et al., 2009; in systems biology (for instance, the clues. For example, faster and more reli- Savageau, 2011). This biological under- movement from two-hybrid interactions able ways to synthesize large pieces of standing will then be integrated with com- to mass spectrometry to measure protein DNA may be uncovered by examination putational models to develop computer- interactions). It will also require the devel- of new organisms and thereby reveal aided design tools. opment of even more sensitive strategies new nonchemical methods for DNA to make time-dependent measurements synthesis. What Does Systems Biology Mean inside many cells simultaneously. Taken A second approach is to develop the to Synthetic Biology? together, systems biology is confronted methods to generate new component As with synthetic biology, many different with the problem of both sensitivity and functions that can act as sensors, regula- types of research have been categorized scale. tors, controllers, and enzyme activities, as systems biology. Broadly speaking, Does the ultimate goal of synthetic bio- for example. These components will in systems biology represents an approach logy of the predictable design, construc- turn extend the set of parts from which to biological research that focuses on tion, and characterization of biological synthetic biologists can build genetic the interactions between components of systems rely on findings and approaches devices and systems. Synthetic biologists a biological system and how those inter- from systems biology? Design, analysis, work not only with design of DNA that actions give rise to the dynamic behavior and understanding are integrally linked in encodes genetic circuits but also with of the system in contrast to the more tradi- engineering methodology. Therefore, it is molecular design of biomolecules, such tional molecular biologists’ reductionist reasonable to assume that advances
856 Cell 144, March 18, 2011 ª2011 Elsevier Inc. gained through systems systems biology research. biology in our understanding For example, genome se- of how biological compo- quencing can provide an nents interact to form inte- increased diversity of biolog- grated systems will support ical parts that synthetic biolo- efforts in synthetic biology to gists can use in their gene design engineered biological circuit designs. More impor- systems. However, there is tantly, systems biology will a different viewpoint that provide not just the physical argues that the design princi- parts but a fundamental ples that systems biologists understanding of how these elucidate for natural biolog- components can be inte- ical systems are products of grated effectively with other evolution over many millions components and how biolog- of years and thus are limited ical systems integrate diverse by the history of what came components and regulatory before. It is possible, then, mechanisms to achieve that the design principles robust information transmis- elucidated for natural biolog- sion and behaviors. The ical systems may not be importance of this contribu- necessary or optimal for the tion is highlighted by the engineered systems that syn- limited diversity of parts and thetic biologists may design regulatory mechanisms that from scratch on a computer have been integrated into with less of a restriction of Figure 1. The Challenges and Synergies for Systems and Synthetic synthetic gene circuits to generating new function Biology date, in which the majority through evolutionary pro- of engineered systems rely cesses and timescales. Both on a limited number of tran- of these views have merit, and the reality take a bottom-up approach, with systems scriptional regulators and do not exhibit is likely somewhere in between—even if biology emphasizing the understanding of robust behaviors over different timescales synthetic biologists design biological biological systems from the underlying and environmental conditions (Elowitz systems to have certain properties that components and synthetic biology and Leibler, 2000; Gardner et al., 2000; are not generally found in natural systems emphasizing building biological systems Purnick and Weiss, 2009). In order to (i.e., optimized for troubleshooting, from modular components. move toward the design of integrated tailoring, reuse, removal, designer identifi- In examining the parallels between the genetic systems, synthetic biologists will cation), a greater understanding of how two fields, it is also useful to examine need to design more sophisticated ge- components interact to form integrated how the key challenges each field is cur- netic circuits that utilize diverse regulatory systems will inform and support the rently facing relate to one another (Fig- strategies (specifically, the integration of design process. ure 1). The challenges synthetic biologists posttranscriptional and posttranslational currently face in engineering genetic mechanisms), balance energetic load, Synergy between Systems systems can be classified as relating to and dynamically modulate system be- and Synthetic Biology either limitations in understanding biolog- havior (Lim, 2010; Win et al., 2009). Although synthetic biology did not directly ical systems or limitations in technical Another important contribution of sys- emerge from systems biology, there are capabilities to study biological systems. tems biology to synthetic biology is asso- important parallels between the two The challenges systems biologists cur- ciated with the technologies and tools for fields. Both systems biology and syn- rently face in understanding biological analyzing biological systems. Synthetic thetic biology represent fundamental systems are related to the complexities biologists often spend the bulk of their shifts in approaches from the fields they associated with studying natural biolog- effort in a design, characterization, and grew out of. Whereas systems biology ical systems and the inadequacies of cur- optimization loop, where original designs represents a shift in the more traditional rent computational models to capture the are modified based on characterization reductionist approach taken in biological physical properties of biological systems. data to achieve the desired system be- research from studying components in We see several areas where these two havior. The tools developed by systems isolation to studying integrated compo- fields can be brought together to effec- biologists to study components in a sys- nents, synthetic biology represents a shift tively address these challenges. tem and their interactions can be applied in emphasizing engineering principles and The richness and complexity of engi- to analyzing synthetic systems and trou- methodology in building biological sys- neered genetic networks, which synthetic bleshooting the system performance. tems from more traditional genetic engi- biologists could build, will be advanced This is particularly true in cases in which neering research. In addition, both fields by using the knowledge gained through the synthetic gene network may have
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 857 unanticipated effects on native pathways matches for the physical model of a cell desired, evolvable behaviors and eventu- in the cell that may in turn affect system (i.e., hard sphere, dilute gas models), ally constructing ecosystems that exhibit behavior. A common example of this chal- such that the ability of current computa- dynamic and predictable behavior lenge can arise in engineering metabolic tional models to capture system behavior patterns. pathways, for which synthetic biologists is limited at best. The potential advances However, it is important to look at can use genome-wide profiling of tran- in constructing genetic systems coming history in thinking about the promises script, protein, or metabolite levels to from synthetic biology research may and risks of synthetic biology. Molecular identify undesired effects of introducing enable systems biologists to shift from biology, and in particular the insertion of the synthetic pathway in the host cell on computational models to physical models foreign genes into microbes, was met critical functions such as redox balance, for their systems by implementing simula- with circumspection by both the public cofactor levels, and stress response tions inside of cells. Specifically, scalable and scientific communities. At the time, (Mukhopadhyay et al., 2008). As another and affordable DNA synthesis technology scientists made promises to the public— example, systems biologists have devel- can allow systems biologists to build for example, the production of human oped a variety of computational tools for many modified versions of natural sys- insulin by engineered bacteria—and modeling biological systems and sharing tems to test their understanding of those delivered on at least some of these prom- information on biological components systems. ises (Villa-Komaroff et al., 1978). So, what across different databases. These tools can we expect from the interplay between will be useful foundations for synthetic Perspective of the Future systems biology and synthetic biology in biologists looking to develop methods to Moving forward, the synergy between the near and long term? In the near term, standardize and share information across synthetic and systems biology will drive we have already seen companies promise component libraries and develop com- transformative advances in biotech- to deliver on new fuels and carbon-based puter-aided design tools for biological nology. The impact includes not only fur- products (such as plastics), and in 5 years systems. ther understanding of the complexity of time this will be a partial reality, thereby Advances in synthetic biology will biological systems but the ability to use starting to take petroleum out of the provide key contributions to systems this information to, for instance, design production loop. We believe that, in biology research by creating new tools better drugs, commodity manufacturing 10 years time, many high-value commod- for interfacing with and manipulating bio- processes, and cell-based therapies ities, including drugs, will be produced logical systems. Research aimed at (Ducat et al., 2011). As one example, the biologically as the result of synthetic understanding a biological system often complexity of biosynthesis processes biology efforts. In the much longer time utilizes methods to perturb or manipulate that can be engineered has been recently frame of 20 to 50 years, we hope that that system and examine the resulting advanced through the integration of a synthetic biology will lead to new cell- behavior of the modified system. Syn- number of pathway construction and based therapies, the expansion of immu- thetic biologists are developing novel optimization tools, including genomic dis- notherapy, synthetic organs and tissues, genetic devices that can be used by covery and engineering (Bayer et al., and rebuilding devastated environments systems biologists to interface with native 2009; Ro et al., 2006; Wang et al., 2009), and ecosystems. networks and precisely probe and manip- in vivo screens for enzyme activity These anticipated futures bring us to ulate those systems. For example, (Pfleger et al., 2006), and enzyme localiza- the controversial areas in synthetic bio- synthetic genetic devices have recently tion strategies (Dueber et al., 2009). logy. How do we think about a future been used to rewire signaling pathways Future efforts will focus on the develop- that could involve the reprogramming of and create novel interactions between un- ment of more advanced tools for biopro- entire organisms? Should we consider related cellular components (Culler et al., cess optimization, such as those enabling engineering ecosystems to support sus- 2010; Lim, 2010). In addition, synthetic noninvasive monitoring of pathway flux tainable agriculture, environmental reme- biology can contribute strategies for (Win and Smolke, 2007), closed loop diation, and pathogen removal and to simplifying and isolating biological com- embedded control of biosynthesis system treat human disease? How far should ponents and their interactions through behavior (Dunlop et al., 2010; Farmer and and can we go in reprogramming life to the application of diverse approaches Liao, 2000), and biosynthesis compart- form new types of cells, tissues, and for implementing specific component mentalization and specialization. As ano- entire organisms? These are only some interactions. ther example, systems engineering strat- of the potential benefits and questions Synthetic biology can also provide new egies will play key roles in addressing scientists, engineers, policy makers, simulation platforms for systems biology. current challenges in cellular therapies governments, and, most importantly, the For example, systems biologists currently by enabling the programming of cell-fate public will need to ponder. Molecular develop mathematical models to repre- decisions (Culler et al., 2010), differenti- biologists set standards for safe use of sent the behavior of their systems and ated states (Deans et al., 2007), improved engineered organisms over 30 years use these models to predict the behavior engraftment and targeting (Chen et al., ago. However, as research in synthetic of their systems under different perturba- 2010), and effective kill switches (Callura biology is advancing toward the goals of tions and environments. However, the et al., 2010). Ultimately, researchers will making biology easier to engineer, the development of these models often re- design systems that incorporate evolu- issues of safety and ethical use are being quires assumptions that are imperfect tion—designing gene circuits that exhibit revisited as we write this Essay. In fact,
858 Cell 144, March 18, 2011 ª2011 Elsevier Inc. a recent US government report captures Bayer, T.S., Widmaier, D.M., Temme, K., Mirsky, Lim, W.A. (2010). Nat. Rev. Mol. Cell Biol. 11, many of the critical issues around public E.A., Santi, D.V., and Voigt, C.A. (2009). J. Am. 393–403. 131 benefits and responsible stewardship Chem. Soc. , 6508–6515. Matzas, M., Stahler, P.F., Kefer, N., Siebelt, N., (Presidential Commission for the Study Burrill, D.R., and Silver, P.A. (2010). Cell 140, Boisguerin, V., Leonard, J.T., Keller, A., Stahler, 13–18. C.F., Haberle, P., Gharizaden, B., et al. (2010). of Bioethical Issues, 2010). Nat. Biotechnol. 28, 1291–1294. Although each field could in principle Callura, J.M., Dwyer, D.J., Isaacs, F.J., Cantor, C.R., and Collins, J.J. (2010). Proc. Natl. Acad. Mukhopadhyay, A., Redding, A.M., Rutherford, exist without the other, we instead feel Sci. USA 107, 15898–15903. B.J., and Keasling, J.D. (2008). Curr. Opin. Bio- that the natural interplay between design, 19 Chen, Y.Y., Jensen, M.C., and Smolke, C.D. technol. , 228–234. analysis, and understanding highlights (2010). Proc. Natl. Acad. Sci. USA 107, 8531–8536. Norville, J.E., Derda, R., Drinkwater, K.A., the important relationship between Culler, S.J., Hoff, K.G., and Smolke, C.D. (2010). Leschziner, A.E., and Knight, T.R. (2010). J. Biol. 4 systems biology and synthetic biology. Science 330, 1251–1255. Eng. , 17. Systems biology brings added layers of Pfleger, B.F., Pitera, D.J., Smolke, C.D., and Keas- Deans, T.L., Cantor, C.R., and Collins, J.J. (2007). ling, J.D. (2006). Nat. Biotechnol. 24, 1027–1032. information that will further empower Cell 130, 363–372. Presidential Commission for the Study of Bioeth- future efforts to design synthetic biolog- Dougherty, M.J., and Arnold, F.H. (2009). Curr. ical Issues. (2010). http://www.bioethics.gov/. ical systems. Synthetic biology brings Opin. Biotechnol. 20, 486–491. Purnick, P.E., and Weiss, R. (2009). Nat. Rev. Mol. new technologies and tools that can be Ducat, D.C., Way, J.C., and Silver, P.A. (2011). Cell Biol. 10, 410–422. applied to effectively test our under- Trends Biotechnol. 29, 95–103. Ro, D.K., Paradise, E.M., Ouellet, M., Fisher, K.J., standing of natural biological systems. Dueber, J.E., Wu, G.C., Malmirchegini, G.R., Newman, K.L., Ndungu, J.M., Ho, K.A., Eachus, By integrating the contributions of these Moon, T.S., Petzold, C.J., Ullal, A.V., Prather, R.A., Ham, T.S., Kirby, J., et al. (2006). Nature K.L., and Keasling, J.D. (2009). Nat. Biotechnol. rapidly evolving fields, scientists and 440, 940–943. engineers together will be well positioned 27, 753–759. Savageau, M. (2011). Ann. Biomed. Eng. Published Dunlop, M.J., Keasling, J.D., and Mukhopadhyay, to transform health, well-being, and the online January 4, 2011. 10.1007/s10439-010- A. (2010). Syst. Synth. Biol. 4, 95–104. environment in the years to come. 0220-2. Ellis, T., Wang, X., and Collins, J.J. (2009). Nat. Sprinzak, D., and Elowitz, M.B. (2005). Nature 438, Biotechnol. 27, 465–471. ACKNOWLEDGMENTS 443–448. Elowitz, M.B., and Leibler, S. (2000). Nature 403, Tian, J., Ma, K., and Saaem, I. (2009). Mol. Biosyst. 335–338. P.A.S. is supported by funds from the NIH, DOD, 5, 14–22. 438 DOE, NSF, and the Wyss Institute for Biologically Endy, D. (2005). Nature , 449–453. Villa-Komaroff, L., Efstradiadis, A., Broome, S., Lo- Inspired Engineering. C.D.S. is supported by funds Farmer, W.R., and Liao, J.C. (2000). Nat. Biotech- medico, P., Tizard, R., Naber, S.P., Chick, W.L., from the NIH, NSF, and the Alfred P. Sloan Founda- nol. 18, 533–537. and Gilbert, W. (1978). Proc. Natl. Acad. Sci. USA tion. The authors thank Drew Endy and Jeff Way for Gardner, T.S., Cantor, C.C., and Collins, J.J. 75, 3727–3731. comments. (2000). Nature 403, 339–342. Wang, H.H., Isaacs, F.J., Carr, P.A., Sun, Z.Z., Xu, Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, G., Forest, C.R., and Church, G.M. (2009). Nature 460, 894–898. REFERENCES V.N., Chuang, R.Y., Algire, M.A., Benders, G.A., Montague, M.G., Ma, L., Moodie, M.M., et al. Win, M.N., and Smolke, C.D. (2007). Proc. Natl. 329 104 Alon, U. (2007). An Introduction to Systems (2010). Science , 52–56. Acad. Sci. USA , 14283–14288. Biology: Design Principles of Biological Circuits Haynes, K., and Silver, P.A. (2009). J. Cell Biol. 187, Win, M.N., Liang, J.C., and Smolke, C.D. (2009). (Boca Raton, FL: Chapman and Hall/CRC Press). 589–596. Chem. Biol. 16, 298–310.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 859 Leading Edge Minireview
Boosting Signal-to-Noise in Complex Biology: Prior Knowledge Is Power
Trey Ideker,1,2,* Janusz Dutkowski,1 and Leroy Hood3 1Departments of Medicine and Bioengineering 2Institute for Genomic Medicine University of California, San Diego, La Jolla, CA 92093, USA 3Institute for Systems Biology, Seattle, WA 98103, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.007
A major difficulty in the analysis of complex biological systems is dealing with the low signal- to-noise inherent to nearly all large biological datasets. We discuss powerful bioinformatic concepts for boosting signal-to-noise through external knowledge incorporated in processing units we call filters and integrators. These concepts are illustrated in four landmark studies that have provided model implementations of filters, integrators, or both.
Introduction important insights into how biological systems are constructed Complexity is the grand challenge for science and engineering in and how they function. Second is the availability of data in the 21st century. Complex systems—by definition—have many many complementary layers—including the genome, transcrip- parts in an intricate arrangement that gives rise to seemingly tome, proteome, metabolome, and interactome. A recent wave inexplicable or emergent behaviors. For example, a radio of new bioinformatic methods has demonstrated how both captures an electromagnetic signal and converts it through elec- weapons—strong prior assumptions related to complexity and tronic circuitry into sound that we hear. To most, the radio is systematic accumulation of complementary data—can be a black box with an input (electromagnetic waves) and an output used together or separately to exact substantial increases in (sound waves). However, understanding the inner workings of signal-to-noise. this box requires going head-to-head with the challenges of In what follows, we summarize these developments within complexity. What are the component parts of the system and a general paradigm for signal detection in biology. Central to how are these parts interconnected? How do these connections this paradigm are processing units we call filters and integrators, influence functions and dynamic system outputs? In biology, which draw on prior biological assumptions and complementary ultimately one would like to create models that predict the data to reduce noise and to boost statistical power. To illustrate emergent behaviors of complex entities—and even re-engineer these ideas in context, we review four landmark studies that have these behaviors to humankind’s benefit. provided model implementations of filters and integrators. To decipher complexity, biologists have developed an impressive array of technologies—next-generation sequencing, The Signal Detection Paradigm tandem mass spectrometry, cell-based screening, and so on— Imagine a biological dataset as a stream of information flowing that are capable of generating millions of molecular measure- into a hypothetical signal detection device (Figure 1A). The infor- ments in a single run. This enormous amount of data, however, mation flow is quantized into atomic units or events, representing is typically accompanied by a fundamental problem—an incred- measurements for entities such as genes or proteins, protein ibly low rate of signal-to-noise. For example, the millions of interactions, SNVs, pathways, cells, or individuals. Each event single-nucleotide variants (SNVs) found in a typical genome- contains a certain amount of information, ranging from a single wide association study or by the International Cancer Genome measurement (e.g., strength of protein interaction) to thousands Consortium (Hudson et al., 2010) make it extremely difficult to (e.g., an SNV state or gene expression value over a population of identify which particular SNVs are the true causes of disease. patients). Some events represent true biological signals, with the Due to the overwhelming number of measurements, such anal- definition of ‘‘signal’’ depending exquisitely on the type of results yses either lack power to detect the true signal or must admit the experimentalist is looking for (e.g., an SNV causing disease an unacceptable amount of noise. or a true protein interaction; many examples are given later). Fortunately, biologists have two major weapons with which The remaining events are noise, which can be due to errors signal-to-noise may be improved. First is what we know about that are technical in nature (uncontrollable variation in different complexity, which can and should be used as strong prior instrument readings collected from the same sample) or biolog- assumptions when analyzing biological data. Known principles ical in nature (uncontrollable variation in different samples of complexity such as modularity, hierarchical organization, collected from the same biological condition). An event may evolution, and inheritance (Hartwell et al., 1999) all provide also be considered part of noise even if it is biological and
860 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Boosting Signal with Filters and Integrators To increase signal-to-noise, a pivotal trend in bioinformatics has been to augment the signal detection process with complemen- tary datasets and with prior knowledge about the nature of signal. The vast majority of these approaches fall into either of two cate- gories that we call filters and integrators (Table S1 available online). Filters attempt to cull some events from the information flow immediately and reject them as noise. For example, a detec- tion system for differential expression might reject certain genes immediately if their expression levels fail to exceed a background value in any condition. Integrators, on the other hand, transform the information flow by aggregating individual events into larger units to yield a fundamentally new type of information, or by inte- grating together different types of information (Hwang et al., 2009). For example, genes might be aggregated into clusters of similar expression or of related function, in which the median levels of the clusters—not their individual genes—are propa- gated as the ‘‘events’’ on which final accept/reject decisions are made (Park et al., 2007). Importantly, the combining of filters or integrators results in a new device that itself can be recom- bined with other signal detection systems in a modular fashion. Both filters and integrators influence statistical power and FDR, but by fundamentally different means. Filters reduce the fraction of noise passing through the system and, as a conse- quence, the FDR. Alternatively, as filters are added, FDR can be held constant by relaxing the decision threshold, resulting in higher statistical power (Figures 1B and 1C). By comparison, integrators combine a train of weak signals into fewer stronger events, leading to an increase in ‘‘effect size’’ and thus a direct increase in statistical power. These methods complement the more classical means of boosting power by increasing the amount of information per event (also called the sample size) (Figure 1A). In each of the following four examples, boosting power with Figure 1. Boosting Signal-to-Noise in Biological Data using Prior a combination of filters and integrators has been critical to the Knowledge (A) Signal detection paradigm in which an input data stream is routed through success of a landmark genome-scale analysis project. a series of filtering and integration units, ending in a statistical test that makes accept or reject decisions. Symbols: m, information per event or sample size; Example 1: Pathway-Level Integration of Genome-wide D t , effect size; a, decision threshold; FDR, false discovery rate. Association Studies (B) Probability distribution P(t) of the test statistic t over the entire data stream of signal plus noise (purple). This distribution is factored into a red signal and Genome-wide association studies (GWAS) seek to identify a blue noise component. FDR and power are visualized in terms of the areas polymorphisms, such as SNVs, that cause a disease or other t under these curves to the right of a. phenotypic trait of interest. Despite the success of this strategy (C) Effect of varying parameters on the signal, noise, and signal plus noise probability distributions. The power is increased by more than 6-fold in mapping SNVs underlying many diseases, the identified loci compared to (B), at an identical FDR. Colors are shown as in (B). typically explain only a small proportion of the heritable variation. (D) MAGENTA, a specific implementation of the signal detection paradigm for For such diseases, one likely explanation is that the genetic pathway-based disease gene mapping as described in Segre` et al. (2010). contribution is distributed over many functionally related loci with large collective impact but with only modest individual effects that do not reach genome-wide significance in single- reproducible, simply because it encodes aspects of phenotype SNV tests (Wang et al., 2010; Yang et al., 2010). irrelevant to the current studies. Based on this hypothesis, Segre` et al. (2010) investigated the To make a decision on which events are signal, the device collective impact of mitochondrial gene variation in type II dia- scores each event and accepts those for which the score betes. They described a method called MAGENTA that performs exceeds a statistically defined decision threshold (Figure 1A). It a meta-analysis of many different GWAS to achieve larger is precisely this decision that becomes problematic in many sample sizes than any single study, thereby increasing statistical large-scale biological studies, in which one either mistakenly power. MAGENTA also includes both filtering and integration rejects a large proportion of the true signal (low statistical power) steps (Figure 1D). First, a filter is applied so that SNVs that fall or must tolerate a high proportion of accepted events that are far from genes are removed. Next an integrator is applied to noise (high false discovery rate or FDR). transform SNVs to genes, such that each gene is assigned a
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 861 score equal to the most significant p value of association among an integrator was used to translate all remaining SNVs into their its SNVs. Gene scores are further corrected for confounding corresponding genes. factors such as gene size, number of SNVs per kilobase, and Using the entire system of filters and integrators under genetic linkage. Finally, a second integrator combines the scores a compound heterozygote recessive model, a total of three across sets of genes assigned to the same biochemical function genes were identified as candidates. One of these (DHODH) or pathway, resulting in a single pathway-level p value of associ- was concurrently shown to be the cause of Miller syndrome. ation. In this way, the family genome sequencing approach used the Simulation studies using MAGENTA suggest a potentially principles of Mendelian genetics (prior knowledge) to correct large boost in power to detect disease associations approximately 70% of the sequencing errors, directly identify (Figure S1A). For example, the method has 50% power to detect rare variants (those present in two or more family members), enrichment for a pathway containing 100 genes of which 10 and reduce enormously the search space for disease traits (cor- genes have weak association to the trait of interest. This perfor- responding to an increase in statistical power from 0.15% to mance is compared to only 10% power to detect any of the 10 33%) (Figure S1B). genes at the single-SNV level. At this increased power, MAGENTA did not identify any mitochondrial pathways as func- Example 3: Assembly of Global Protein Signaling tionally associated with type II diabetes, suggesting that mito- Networks chondria have overall low genetic contribution to diabetes Another area in which filtering and integration are turning out to susceptibility—a surprise given the conventional wisdom about be key is assembly of protein networks. An excellent example the disease. On the other hand, in an independent analysis of of network assembly is provided by the recent work of Breitkreutz genes influencing cholesterol, MAGENTA identified pathways et al. (2010), in which mass spectrometric analysis was used to related to fatty acid metabolism that had been missed by clas- report a high-quality network of 1844 interactions centered on sical GWAS. yeast kinases and phosphatases. Central to the task of network assembly was a signal detection system for quality control and Example 2: Mapping Disease Genes in Complete interpretation of the raw data. The data consisted of a stream Genomes of more than 38,000 proteins that had been coimmunoprecipi- Sequencing and analysis of individual human genomes is one of tated with a different kinase or phosphatase used as bait. Bait the most exciting emerging areas of biology, made possible by proteins can interact both specifically and nonspecifically with the rapid advances in next-generation sequencing (Metzker, a wide variety of peptides, and the nonspecific interactions 2010). As complete genome sequencing becomes pervasive, comprise a major source of noise. To remove nonspecific inter- one of the most important challenges will be to determine how actions, the authors introduced a method called significance such sequences should best be analyzed to map disease genes. analysis of interactome (SAINT), in which each putative interact- The signal filtering and integration paradigm provides an excellent ing protein is assigned a likelihood of true interaction based on its framework for developing methods in this arena. As a landmark number of peptide identifications (representing the amount of example, Roach et al. (2010) described a filtering methodology information per event or sample size) (Figure S2B). After filtering, for disease genes based on the complete genomic sequences the remaining protein interactors are funneled to an integrator of a nuclear family of four. This approach was used to identify stage in which they are clustered into modules based on their just three candidate mutant genes, one of which encoded the overall pattern of interactions (Table S1). Miller syndrome, a rare recessive Mendelian disorder for which The resulting modular interaction network reveals an unprece- both offspring, but neither parent, were affected. dented level of crosstalk between kinase and phosphatase units To begin the analysis, the four genome sequences were during cell signaling. In this network, kinases and phosphatases processed to identify approximately 3.7 million SNVs across are not mere cascades of proteins ordered in a linear fashion. the family. SNVs were then directed through a series of filters Rather, they are more akin to the neurons of a vast neural (Figure S2A). In the first, SNVs were rejected if they were unlikely network, in which each kinase integrates signals from myriad to influence a gene-coding region annotated in the human others, enabling the network to sense cell states, compute func- genome reference map (http://genome.ucsc.edu/), leaving tions of these states, and drive an appropriate cellular response. approximately 1% of SNVs that led to missense or nonsense It is likely that evolution tunes this network, such that some inter- mutations or fell precisely onto splice junctions. A second filter actions dominate and others are minimized in a species-specific removed SNVs that were common in the human population fashion. This might help explain two paradoxical effects seen and thus were unlikely to cause a rare Mendelian disorder. Like pervasively in both signaling and regulation: (1) the same the first one, this filter yielded an approximate 100-fold decrease network across species can be used to control very different in the number of candidates. A third filter was designed to check phenotypes (McGary et al., 2010); and (2) very different networks inheritance patterns, which can be gleaned only from a family of across species can be used to execute near identical responses related genomes. SNVs were removed that had a non-Mendelian (Erwin and Davidson, 2009). pattern of inheritance (result of DNA sequencing errors) or did not segregate as expected for a recessive disease gene, in which Example 4: Filtering Gene Regulatory Networks each affected child must inherit recessive alleles from both using Prior Knowledge parents. This filter yielded another 4- to 5-fold decrease in candi- One of the grand challenges of biology is to decipher the date SNVs versus using only a single parental genome. Finally, networks of transcription factors and other regulatory
862 Cell 144, March 18, 2011 ª2011 Elsevier Inc. components that drive gene expression, phenotypic traits, and applied to problems of complexity inherent in other scientific complex behaviors (Bonneau et al., 2007). Toward this goal, domains, including energy, agriculture, and the environment. probabilistic frameworks such as Bayesian networks have Healthcare and energy will demand significant societal been extensively applied to learn gene regulatory relationships resources moving forward—and hence offer unique opportuni- from mRNA expression data gathered over multiple time points ties to push the development and application of approaches and/or experimental conditions (Friedman, 2004). However, for attacking complexity. due to a limited sample size, large space of possible networks, and probabilistic equivalence of many alternative models, these SUPPLEMENTAL INFORMATION approaches are often unable to find the underlying causal gene relationships. Supplemental Information includes two figures and one table and can be found Recently, Zhu et al. (2008) showed that supplementing with this article online at doi:10.1016/j.cell.2011.03.007. gene expression profiles with complementary information on ACKNOWLEDGMENTS genotypes may help to overcome some of these problems (Figure S2C). These authors sought to assemble a gene regula- We gratefully acknowledge G. Hannum, S. Choi, I. Shmulevich, D. Galas, tory network for the yeast Saccharomyces cerevisiae using J. Roach, and N. Price for helpful comments and feedback. This work was previously published mRNA expression profiles gathered for funded by grants from the National Center for Research Resources 112 yeast segregants. Rather than assemble a Bayesian network (RR031228, T.I., J.D.), the National Institute of General Medical Sciences from expression data alone, the data were first supplemented (GM076547, L.H.; GM070743 and GM085764, T.I.), the Department of Defense (W911SR-07-C-0101, L.H.), and the Luxembourg strategic partner- with the genotypes of each segregant. The combined dataset ship (L.H.). was then analyzed to identify expression quantitative trait loci (eQTL)—genetic loci for which different mutant alleles REFERENCES associate with differences in expression for genes at the same locus (cis-eQTL) or for genes located elsewhere in the genome Bonneau, R., Facciotti, M.T., Reiss, D.J., Schmid, A.K., Pan, M., Kaur, A., (trans-eQTL). The eQTLs were used as a filter to prioritize Thorsson, V., Shannon, P., Johnson, M.H., Bare, J.C., et al. (2007). Cell 131, some gene relations and demote others. Any candidate cause- 1354–1365. effect relations in which the effect gene is near an eQTL were Breitkreutz, A., Choi, H., Sharom, J.R., Boucher, L., Neduva, V., Larsen, B., Lin, 328 removed, as the cis-eQTL already explains the gene expression Z.Y., Breitkreutz, B.J., Stark, C., Liu, G., et al. (2010). Science , 1043–1046. changes at that locus. Conversely, cause-effect relations that Erwin, D.H., and Davidson, E.H. (2009). Nat. Rev. Genet. 10, 141–148. were supported by trans-eQTLs and passed a formal causality Friedman, N. (2004). Science 303, 799–805. test were prioritized. Supplementing gene expression profiles Hartwell, L.H., Hopfield, J.J., Leibler, S., and Murray, A.W. (1999). Nature 402, with genetic information significantly enhanced the power to C47–C52. identify bona fide causal gene relationships. Further improve- Hudson, T.J., Anderson, W., Artez, A., Barker, A.D., Bell, C., Bernabe, R.R., 464 ment was achieved by introducing a second filter that prioritized Bhan, M.K., Calvo, F., Eerola, I., Gerhard, D.S., et al. (2010). Nature , 993–998. cause-effect relations that correspond to measured physical interactions, including data from the many genome-wide chro- Hwang, D., Lee, I.Y., Yoo, H., Gehlenborg, N., Cho, J.H., Petritis, B., Baxter, D., Pitstick, R., Young, R., Spicer, D., et al. (2009). Mol. Syst. Biol. 5, 252. matin immunoprecipitation experiments published for yeast McGary, K.L., Park, T.J., Woods, J.O., Cha, H.J., Wallingford, J.B., and Mar- that document physical interactions between transcription cotte, E.M. (2010). Proc. Natl. Acad. Sci. USA 107, 6544–6549. factors and gene promoters. Metzker, M.L. (2010). Nat. Rev. Genet. 11, 31–46. 8 Summary Park, M.Y., Hastie, T., and Tibshirani, R. (2007). Biostatistics , 212–227. Roach, J.C., Glusman, G., Smit, A.F., Huff, C.D., Hubley, R., Shannon, P.T., Biology is expanding enormously in its ability to decipher Rowen, L., Pant, K.P., Goodman, N., Bamshad, M., et al. (2010). Science complex systems. This ability derives from the expanded power 328, 636–639. to incorporate diverse and complementary data types and to Segre` , A.V., Groop, L., Mootha, V.K., Daly, M.J., and Altshuler, D. (2010). PLoS inject prior understanding of biological principles. Signal detec- Genet. 6, e1001058. tion systems such as those discussed here—along with their Wang, K., Li, M., and Hakonarson, H. (2010). Nat. Rev. Genet. 11, 843–854. filters, integrators, and other components—are leading to funda- Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S., Henders, A.K., Nyholt, D.R., mental new biological discoveries and models, some of which Madden, P.A., Heath, A.C., Martin, N.G., Montgomery, G.W., et al. (2010). Nat. will ultimately transform our understanding of disease and ther- Genet. 42, 565–569. apeutics. It is also likely that many of the strategies, technolo- Zhu, J., Zhang, B., Smith, E.N., Drees, B., Brem, R.B., Kruglyak, L., Bum- gies, and computational tools developed for healthcare can be garner, R.E., and Schadt, E.E. (2008). Nat. Genet. 40, 854–861.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 863 Leading Edge Perspective
Principles and Strategies for Developing Network Models in Cancer
Dana Pe’er1,2,* and Nir Hacohen3,4,5 1Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY 10027, USA 2Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, New York, NY 10032, USA 3Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA 4Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, 149 13th Street, Charlestown, MA 02129, USA 5Department of Medicine, Harvard Medical School, Boston, MA 02115, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.001
The flood of genome-wide data generated by high-throughput technologies currently provides biol- ogists with an unprecedented opportunity: to manipulate, query, and reconstruct functional molec- ular networks of cells. Here, we outline three underlying principles and six strategies to infer network models from genomic data. Then, using cancer as an example, we describe experimental and computational approaches to infer ‘‘differential’’ networks that can identify genes and processes driving disease phenotypes. In conclusion, we discuss how a network-level under- standing of cancer can be used to predict drug response and guide therapeutics.
Cells contain a vast array of molecular structures that come Such a global understanding of networks can have transforma- together to form complex, dynamic, and plastic networks. The tive value, allowing biologists to dissect out the pathways that recent development of high-throughput, massively parallel tech- go awry in disease and then identify optimal therapeutic strate- nologies has provided biologists with an extensive, although still gies for controlling them. incomplete, list of these cellular parts. The emerging challenge To illustrate the potential impact of global models, we note over the next decade is to systematically assemble these that the effect of a cancer drug is often hard to predict because components into functional molecular and cellular networks crosstalk and feedback are still poorly mapped in most and then to use these networks to answer fundamental ques- signaling pathways. For example, the mammalian target of tions about cellular processes and how diseases derail them. rapamycin (mTOR) is critical for cell growth, and its activity is For example, how do these cellular components come together aberrant in most cancers; hence, it was expected to be to robustly maintain homeostasis, process exogenous and a good therapeutic target. Nevertheless, it shows poor results endogenous signals, and then coordinate responses? How do in clinical trials. This deviation from our expectations may be genetic aberrations disrupt the regulatory network and manifest due to feedback and crosstalk between the Akt/mTOR and in disease, such as cancer? In this Perspective, we reason that, the extracellular signal-regulated kinase (ERK) pathways (Carra- even with a partial understanding of molecular networks, biolo- cedo et al., 2008). Inhibition of mTOR releases feedback inhibi- gists are currently poised to understand how networks are de- tion of the receptor tyrosine kinases, which can activate both regulated in cancer cells and then predict how these networks ERK and Akt (O’Reilly et al., 2006) and subsequently increase might respond to drugs. cell proliferation. Quantitative biophysical network models encompassing For targeted therapy to succeed, a global view of the inter- a small number of components have made enormous contribu- connectivity of signaling proteins and their influences is critical. tions to our understanding of cellular networks. However, in In this Perspective, we consider the current state and potential this Perspective, we focus on deriving network models at a large future of data-driven computational approaches to network systems scale from high-throughput data, using ‘‘data-driven inference, with an emphasis on applications to cancer. We will network inference.’’ In this process, a set of modeling assump- describe three principles underlying molecular networks tions are defined, such as ‘‘genetic aberrations alter normal and inferring these from data. These principles are matched cellular regulation and drive tumor proliferation.’’ Then, data to current experimental capabilities and will need revamping are used to derive a specific model, such as specifying for as technological leaps produce new types of data (e.g., more each tumor, which typically harbors many aberrant genes, which quantitative data and with real-time dynamics). We then particular genes drive proliferation. In the end, a ‘‘good’’ model of consider six promising experimental-computational strategies biological networks should be able to predict the behavior of the for constructing network-level models. Though not exhaustive, network under different conditions and perturbations and, these principles and strategies illustrate fruitful directions in ideally, even help us to engineer a desired response. For network biology and will hopefully stimulate discussion and example, where in the molecular network of a tumor should we experimentation among computational and experimental perturb with drug to reduce tumor proliferation or metastasis? biologists.
864 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Principle 1: Molecular Influences Generate Statistical One option is to build qualitative, rather than quantitative, Relations in Data models. These models can identify qualitative features such as Network biology has been empowered by genomics technologies ‘‘Mek (mitogen-activated protein kinase) activates Erk’’ or that that enable the simultaneous measurement of thousands of ‘‘Met4 and Met28 are required together to induce sulfur metab- molecular species. Such data offer a global unbiased view of the olism.’’ If quantitative modeling is important for the problem at entire system, which in turn necessitates computation and statis- hand, linear regression models provide a robust alternative to tics. The key underlying assumption frequently used for inferring nonlinear models (e.g., target gene expression is a linear combi- networks from genomic data is that influences and interactions nation of its transcription factors). Although nonlinear relations between biological entities generate statistical relations in the frequently occur in biology, linear regression models are more observed data. For example, if protein A induces expression of robust, and thus they often give better results, even when the protein B, then we expect to see high levels of protein B whenever underlying model is nonlinear. A detailed molecular model that levels or specific molecular states of its activator A are high. The is exhaustive in its molecular species and in the modeling of their reverse of this logic is that statistical correlation between protein interactions remains beyond our reach for the near future. states indicates a potential interaction between them. In a data- A powerful strategy in systems biology is to abstract and driven manner, a computer can comprehensively test millions of simplify models. In the ‘‘module-network’’ approach (Segal such hypotheses in seconds and provide a statistical score for et al., 2003), genes are grouped into modules that are assumed each candidate molecular interaction or influence. For example, to share a regulatory program. The rationale for this grouping is one can test the statistical association between the DNA copy based on numerous examples in which the same regulatory number of a candidate regulator and gene expression of a target circuits coordinate activation or repression of groups of genes for each locus and gene in the genome (see Strategy 4). that are involved in the same process (e.g., the entire ribosome Various statistical frameworks have been successfully applied complex is regulated by common transcription factors). By pool- to network inference (Basso et al., 2005; Bonneau et al., 2007; ing many similar genes together, the module-network framework Friedman et al., 2000); the commonality between the frameworks significantly increases the statistical power to identify regulatory is that they model a target’s behavior as a function of its regula- influences (Litvin et al., 2009). tors and search for the most predictive regulator set. For example, Bayesian networks were used to reconstruct detailed Principle 2: Networks Are Not Fixed: The Role of Context signaling pathway structures in human T cells using only the and Dynamics concentration of phosphoproteins simultaneously measured in Molecular networks are not static; rather, they exhibit dynamic individual cells (Sachs et al., 2005). Based solely on this data, adaptations in response to both internal states and external this network analysis discovered the majority of known influ- signals. Influences that determine network context can be ences between the measured signaling components without divided into four categories. (1) Genetic background strongly prior knowledge of any pathways. Moreover, the analysis determines network behavior and gives rise to significant uncovered a new point of crosstalk, which was confirmed differences across individuals (and even cells in the special experimentally. case of cancer). (2) Cell lineages have dramatically different The same computational approach and mathematical network structures because of epigenetic changes and differen- formulae correctly reconstructed yeast metabolic networks tial expression of genes. (3) Tissue milieu can reprogram from gene expression data (Pe’er et al., 2001). Together, these networks and their behaviors, as stromal cells do for tumors. studies demonstrate the universal nature of statistical depen- (4) Exogenous signals, such as nutrients and other chemicals, dencies; the same formalism can be used to reconstruct yeast affect networks (Figure 1). Ultimately, health or disease emerges metabolic networks from gene expression data and mammalian from an individual’s integration of internal and external cues. signaling networks from phosphoprotein abundances. In cancer, context can have a profound impact on how Mathematical models of molecular networks have been patients respond to therapies. For example, in recent clinical derived from basic biochemical principles for decades, combin- trials of a new generation of rationally targeted therapies (e.g., ing chemical reaction equations into a quantitative model. For Gleevec, Herceptin, and BRAF inhibitors for chronic myeloge- example, Michaelis Menten equations are frequently used to nous leukemia, breast cancer, and melanoma, respectively), model transcription factor binding to DNA. Nevertheless, most even patients that share the targeted mutation and tumor type contemporary data sets lack the quantitative and statistical displayed substantially variable responses to the drugs (Sharma power to resolve such models, even for small networks. Data- et al., 2010a). In addition, in another recent trial (i.e., phase II), driven approaches typically necessitate hundreds of samples a therapy was extremely effective at reversing tumors in meta- to gain the statistical power to resolve even a partial qualitative static melanoma patients carrying the oncogenic BRAF mutation map of molecular interactions. Data requirements are highly (Flaherty et al., 2010), in which this drug effectively shuts down dependent on the number of components modeled, the mathe- the ERK pathway that is critical for this cancer. Strikingly, matical complexity of the equations representing the molecular however, the same drug leads to the activation of the ERK interactions, and the effect size of the influences themselves. pathway in cells with wild-type BRAF (Poulikakos et al., 2010), Thus, at the heart of data-driven modeling is finding the sweet potentially promoting tumors in these cells. spot in the tradeoff between more realistic (e.g., chemical reac- To gauge such network activity, response, and potential, ex- tion equations) and simpler models that can be inferred more periments must deliberately perturb the cell. For example, blood robustly from data (e.g., linear regression). cells from acute myeloid leukemia patients could not be
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 865 be to generate a model that has a reasonable chance of being able to predict responses to new, previously unmeasured inputs, such as new drugs or combinations of drugs.
Principle 3: Extracting ‘‘Differential’’ Networks Given the importance of context, a central challenge for the field will be to collect data across multiple environments, cell types, and genetic backgrounds using genome-wide profiling to infer network connectivity and function in each context. Rather than explicitly modeling all of the moving parts of a network, we propose that it is feasible to derive models that focus on key components by capturing the essential differences in network wiring, function, and response between contexts (Figure 1). A ‘‘differential-network’’ model is designed to elucidate the following: How do a small number of changes to the network (e.g., genetic, epigenetic) alter the function of the network? At the center of such a model are the altered nodes (i.e., genes or proteins), and data-driven computation can be used to: (1) iden- tify additional components that interact with these altered nodes; (2) qualify and quantify how these interactions are per- turbed; and (3) model how these network perturbations continue to propagate though additional components to generate the phenotype of interest, such as proliferation, invasion, or drug response. For example, Carro et al. (2010) identify C/EBPb and STAT3 as ‘‘master’’ transcription factors for which their overex- pression synergistically activates expression of mesenchymal genes and subsequent tumor aggressiveness in malignant Figure 1. Differential Networks Explain Phenotypic Variation across glioma (see Strategy 3). Contexts The network model can be significantly simplified because The function of a molecular network is determined by context: genetics, tissue only the components that play a role in the modeled response type, environment (e.g., nutrients), cell-cell communication, and small mole- need identification and inclusion. Importantly, the differential cules. These influences combine to determine the phenotypic response. The ‘‘differential network’’ (colored nodes and edges) models the essential network strategy does not apply only to disease. It can be components that determine how and why a phenotypic response will vary used in any context to address questions such as what is the between contexts. difference between two cell types or how does nutrient status affect cellular behavior? differentiated from healthy cells when only the basal levels of phos- Here, we present six strategies that combine experimental and phorylation of key signaling molecules were measured. Only when computational approaches to generate network inference the samples were interrogated with growth factors and cytokines models. Strategies 1 and 2 focus on identifying key components; did the resulting signaling profiles correlate with tumor genetics, Strategies 3 and 4 focus on deriving key network components drug response, and disease outcome (Irish et al., 2004). The impor- concurrently with their regulatory influences; and Strategies 5 tance of interrogation with stimuli comes into play because many and 6 advance toward increasingly detailed quantitative models important signaling responses, such as ERK2 activation in of network influences. response to epidermal growth factor receptor (EGFR), depend only on fold change, rather than basal protein levels that exhibit Strategy 1: Discovery of Inherited Alleles ahighdegreeofvariance(Cohen-Saidon et al., 2009). and Somatic Mutations Cellular responses often involve multiple feedback loops and Chromosomal aberrations and mutations are a central charac- additional complexities (see Review by Yosef and Regev on teristic of tumor cells. Multiple genetic aberrations collectively page 886 of this issue). For example, the transcriptional influence the expression of thousands of genes, altering the response to EGF stimulation induces feedback attenuation pathways and processes underlying malignant behaviors. The factors, such as dual-specific phosphatases (DUSPs), which emergence of high-resolution copy number assays and shut down the same pathways that activate EGF signaling massively parallel sequencing technologies opens the possibility (Amit et al., 2007). Therefore, to understand tumor network func- of tracing phenotypic differences back to their genetic source. tion, drug response, and the emergence of drug resistance, Large-scale initiatives are currently sequencing thousands of tumors must be systematically interrogated with different stimuli tumor genomes to comprehensively catalog the prevalent and drugs, followed by time series measurements. These sequence mutations and chromosomal aberrations underlying measurements can then be used to derive a model describing each cancer type. Indeed, entire cancer genomes have already the quantitative temporal sequence of events from the initial been sequenced in dozens of tumors, revealing a surprising detection of an input to the tumor’s response. The goal would degree of mutations and chromosomal aberrations in each
866 Cell 144, March 18, 2011 ª2011 Elsevier Inc. individual cancer (Stephens et al., 2009). On the other hand, exon tion greatly facilitates causal gene identification. Taking advan- capture techniques, called exome sequencing (Ng et al., 2010), tage of sequenced genomes, mammalian interference (RNAi) concentrate on the 1% of coding sequence in the human libraries have emerged as a central tool for systematic perturba- genome. This technique enables a more economical cataloging tion of any gene. Indeed, RNAi-based screens have proven to be of coding mutations in cohorts of hundreds of tumors per cancer a major tool in cancer research in which cell lines are readily type. Finally, transcriptome (or RNA) sequencing identifies ex- available and cell proliferation and survival provide surrogates pressed coding and noncoding RNA mutations. Transcriptome of tumorigenesis. sequencing also reveals fusion genes created by intronic trans- In one strategy, unbiased genome-wide RNAi screens in vitro locations, which are therefore undetected by exon sequencing and in vivo are used to identify candidate causative oncogenes techniques (Maher et al., 2009). and tumor suppressors that affect cell proliferation or survival. These large-scale sequencing projects have uncovered a Typically, candidate genes that are found to have an aberrant staggering diversity of genetic aberrations across tumors. sequence mutation, copy number alteration, or expression change Although each individual tumor typically harbors a large number in tumors are usually selected for deeper mechanistic character- of aberrations, only a few play a role in pathogenesis. Therefore, ization (Boehm et al., 2007; Ngo et al., 2010). However, one must distinguishing between genetic changes that promote cancer always keep in mind that candidate genes that are not aberrant progression (i.e., driver mutations) and neutral mutations (i.e., may be equally important to study and target therapeutically. passenger) is like finding needles in haystacks. In a second strategy, candidate genes are first selected from Recurrence was a rule of thumb for copy number aberrations cancer genomic data sets and then validated with small-scale (Weir et al., 2007). Thus, it was unforeseen that only a handful of RNAi screens. For example, this strategy was recently used to genes would recurrently be targeted by sequence mutations in identify critical genes within tumor chromosomal deletions (Ebert each cancer type. The current presumption is that the majority et al., 2008) and for finding the small subset of genes that affect of the driver mutations are unique to each tumor. A key unre- metastasis among hundreds selectively expressed in metastatic solved computational challenge is, therefore, to identify the tumor (Bos et al., 2009). driver mutations associated with each cancer genome. Indeed, Finally, unbiased screens can also shed light on the suscepti- the identification of these drivers is required before a differen- bility or resistance of specific tumors to treatment (Ho¨ lzel et al., tial-network approach can model how the pathogenic behavior 2010) and to find ways to enhance the effects of current thera- emerges. Computational methods addressing this task are still pies, such as taxanes (Whitehurst et al., 2007). Indeed, these under development (Akavia et al., 2010; Beroukhim et al., types of findings can rapidly influence clinical research and prac- 2010; Carter et al., 2009). tice. In all cases, RNAi serves as a ‘‘functional filter’’ to pinpoint Although recurrence may not occur at the gene level, signifi- or annotate genes that affect proliferation, death, metastasis, or cant recurrence does occur at the level of pathways. For any cellular processes. example, in glioblastoma, the majority of tumors have mutations Combining computationally guided experiment design with in each of three signaling pathways: P53, retinoblastoma protein RNAi screens has enormous untapped potential. Although 1 (RB1), and rat sarcoma (RAS)/P13K (Cancer Genome Atlas genome-wide data sets are the most comprehensive, they are Research Network, 2008). Because these findings define path- also expensive to perform at the large scale that is required to ways, rather than genes, as unifying explanations for tumor cover all contexts. A more economical approach is to refine progression, it is clear that finding drivers will rely on knowledge our understanding with iterative cycles of experimentation and of molecular networks. computation. Computational hypotheses derived from one Unfortunately, there is currently insufficient information on data set are used to design the experiments for collecting the pathways in existing databases. First, the majority of signaling next data set (Figure 2). For example, protein interaction maps proteins are not associated with any known pathway. Second, and microarray expression data were used to nominate high like- existing databases include only a small part of what is known lihood genes for characterization in an RNAi screen that dissects and typically do not take context (e.g., cell type) into account. interactions between influenza and its host (Shapira et al., 2009). More sophisticated experimental and computational methods This approach deepened our understanding of how the virus will be needed to define and catalog the components involved manipulates or is controlled by key host defenses through direct in each pathway. A promising direction is the use of systematic and indirect interactions with four major host pathways. experimental and computational approaches to build interaction In the cancer setting, a good network model combined with maps (Amit et al., 2009; Bandyopadhyay et al., 2010), which can computational inferences can suggest which gene combina- subsequently be used to identify key aberrant genes. For tions, genetic background, and cell assay (e.g., proliferation, example, an algorithm known as interactome dysregulation invasion, metabolism) should be matched in searching for new enrichment analysis (IDEA) (Mani et al., 2008) uses a specially components. For example, multiple mutations must occur derived context-specific molecular network to identify key aber- together to produce a tumor (Land et al., 1983), necessitating rant genes in lymphoma. a combinatorial RNAi approach. However, because a large-scale combinatorial RNAi screen is not feasible, computational selec- Strategy 2: Discovering Key Network Components tion of likely combinations renders the experiments feasible. Using RNAi Additionally, although most screens are performed in a single Although naturally occurring genetic alterations help to nominate genetic background, in reality, the functional impact of perturba- causal genes in cancer and other diseases, deliberate perturba- tion is highly dependent on genetic background: disrupting the
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 867 Figure 2. Experimental Design for Network Inference (A) To comprehensively characterize tumor response to a drug, we suggest profiling a cohort of genetically characterized tumors using multiple technologies, following perturbation with small molecules and RNAi. Then, data-driven algorithms can infer differential network models from these data. The inferred models subsequently guide the design of experiments for the next iteration of data collection. (B) This figure illustrates how different genetic backgrounds and experiments can help to identify driver mutations and network structure. Each identified mutation recurs in a subset of samples, and driver targets are identified by knockdown using RNAi or drug. expression of a gene can cause death in one cell line and have no tional work in this area has been through the analysis of tumor effect in another cell line (Luo et al., 2008). Thus, it would be useful gene expression profiles that have accumulated on the order to select cell lines with informative genetic backgrounds. Finally, of tens of thousands of microarrays over the past decade. Unlike a good model can link genes with specific biological processes the top-down strategies described above, here, the approach (Akavia et al., 2010) and help us efficiently extend RNAi studies is bottom up: first identify the differentially expressed genes to problems of invasion, metabolism, cell-cell interactions, and relevant to a tumor phenotype of interest, and then use these other cancer hallmarks that are poorly understood (Hanahan genes to pinpoint the master regulator that brings about their and Weinberg, 2011). dysregulation. Data-driven approaches (Principle 1) have been particularly Strategy 3: Statistical Identification of Dysregulated powerful at locating the dysregulated genes and regulatory rela- Genes and Their Regulators tions within tumor-related pathways. Analysis of glioblastoma After discovering key network components, the next step is to gene expression profiles using ARACNE (algorithm for the recon- decipher the wiring of the network. The majority of the computa- struction of accurate cellular networks) (Basso et al., 2005)
868 Cell 144, March 18, 2011 ª2011 Elsevier Inc. revealed two master regulators of mesenchymal transformation to phenotype, building a cascade of events from DNA, through in malignant glioma (Carro et al., 2010): the gene module that modulated gene expression, to tumorigenic phenotype. Anchor- corresponds to the mesenchymal transformation and the tran- ing the model at the DNA provided support for causality of influ- scription factors most likely regulating this module (based on ence between driver and module, although this influence can still mutual information between regulator and targets). Both tran- be indirect by a cascade of unknown mechanisms. scription factors were then confirmed experimentally. Though such modeling approaches have only recently taken By extending this statistical reasoning to higher dimensions, hold in cancer genomics, these have been developing in genetic the MINDY (modulator inference by network dynamics) algorithm association for a few years. Chen and colleagues identified gene (Wang et al., 2009) could cleverly identify posttranslational acti- networks that are perturbed by quantitative trait loci (QTL), which vators and inhibitors of master regulators. Based on the assump- in turn lead to metabolic disease (Chen et al., 2008). A single tion that high (or low) expression of such activators (or inhibitors) comprehensive computation locates the QTL, identifies how it would lead to increased (or reduced) coregulation of MYC with perturbs the molecular network, and in turn leads to variation its known targets, MINDY uncovered new posttranslational in disease traits. As more data types that capture the ‘‘state’’ modifiers of MYC in human B lymphocytes, and four of them of the network are collected (e.g., metabolite concentrations were validated using RNAi. Demonstrating the generality of the using mass spectrometry), these differential-network (Principle statistical approach, the identified modifiers were found to act 3) approaches will lead to increasingly mechanistic and causal by diverse mechanisms, including protein turnover, transcription models of disease. complex formation, and selective enzyme recruitment. Although this strategy can be applied to any process or As we wait for the development of experimental technologies disease, cancer is particularly suited for these approaches that detect most posttranslational changes in high throughput, because somatic mutations driving tumorigenesis typically thousands of existing mRNA expression data sets can benefit have a large impact on multiple genes and cellular processes, from this powerful statistical approach to predict key modulators and thus their effect is more easily detected. Disease genes of regulatory activity by any biochemical mechanism. We have based on germline mutations that persist though the powerful thus only begun to tap into the potential of these approaches evolutionary filters are typically more subtle and harder to detect; to uncover the regulatory mechanisms that lead to tumors and indeed, disease is frequently invoked only by the combinatorial other pathogenic phenotypes. Moreover, once profiles of cancer interaction of many genes. proteomes and their posttranslational modifications become As proof of concept of ‘‘personalized medicine’’ and using more readily available, these methods will be dramatically yeast as a model system, CAMELOT (causal modeling with empowered. expression linkage for complex traits) (Chen et al., 2009) inte- grated genotype and gene expression levels (measured prior Strategy 4: Integrating Genotype and Gene Expression to drug exposure) to quantitatively predict drug sensitivity. into Causal Models Applying a differential network approach, a small number of Current analysis has only scratched the surface of existing data causative genes are identified and then used to build regression sets, and there is critical need for powerful computational models to predict drug response for each yeast strain. The approaches to expose the wealth of hidden information. A prom- algorithm faithfully predicted both the causal genes (24/24 ising approach is ‘‘data integration’’ that builds a model from predictions validated) and drug response. Although epistatic diverse data types (e.g., gene sequence, gene expression relations existed between genes, the statistical simplicity of profiles, and protein-protein interactions), which each shed linear models led to more robust and accurate models from a different light on the underlying biology. The resulting combina- data. We anticipate that a comparable data set from patient tion is more than the sum of the parts (see the MiniReview by tumors (including genotype, basal gene expression, and quanti- Ideker et al. on page 860 of this issue). A natural integration tative drug response) could be used to rationally select each indi- that captures the essence of differential networks is sequence vidual patient’s drug treatment, essentially customizing and opti- and expression. mizing patient care. For example, the CONEXIC (copy number and expression in cancer) algorithm (Akavia et al., 2010) combines DNA copy Strategy 5: Integration of Single Cell Data to Account number with gene expression levels to identify driver mutations for Cell-to-Cell Heterogeneity and predict the processes that they alter. The modeling assump- Whereas the measurements discussed thus far were taken over tions underlying the data integration are: (1) A driver mutation population aggregates using bulk assays, most signal process- should co-vary with a gene module involved in tumorigenesis ing occurs at the level of the individual cell. Over the past (i.e., it assumes that the module’s expression is ‘‘modulated’’ decade, studies have repeatedly demonstrated a large degree by the driver); and (2) Expression levels of the driver control the of heterogeneity between individual cells, even within clonal malignant phenotype rather than copy number (because other populations. This variation arises from differences in protein mechanisms may lead to similar dysregulated expression of concentrations and stochastic fluctuations in biochemical reac- the driver gene). tions involving molecules with low copy numbers. A common This approach predicted two new tumor dependencies in finding is that a response appears dose dependent in bulk melanoma and the processes that they alter. Moreover, these assays but is actually an ‘‘all or nothing’’ response in single cells. predictions were then confirmed using RNAi. CONEXIC thus That is, the intensity of the single cell response remains constant uses gene expression as an intermediary to connect genotype under dose, but the fraction of the cells that respond increases
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 869 with dose (e.g., NF-kB in response to TNFa)(Tay et al., 2010). In Strategy 6: Using Perturbations to Reveal Network these cases, there are a number of distinct subpopulations, and Wiring no individual cell behaves in accordance with the population To infer network models that describe how a network responds average. Such subpopulations confound network inference to stimuli, as well as through what molecular interactions and algorithms when two molecules exhibit statistical dependency mechanisms this sensing and response occurs, comprehensive at the population level but actually reside in mutually exclusive profiles must be measured following perturbations. We consider cells. three methods to perturb the system: RNAi, drugs, and natural Heterogeneity of molecules at the single cell level can have variation. As this strategy is still under development, this section crucial functional impact. Even clonal cell lines treated with is more speculative. drugs under carefully controlled conditions exhibit a large, previ- Measuring network behavior following an RNAi perturbation ously unappreciated degree of variation in cell survival and other uncovers the functions of a gene and provides definitive causal parameters (Cohen et al., 2008). A bulk growth assay can mask links between network components. A key strength of RNAi is a small subpopulation of drug-resistant cells, which can later that it can be used effectively to target any desired gene. form a drug-resistant tumor. Though much debate still exists However, RNAi also has limitations due to its slow kinetics and regarding the origins and emergence of these subpopulations, potential nonspecific cellular responses (e.g., innate immune it is clear that such populations often exist in tumors. For response to double-stranded RNA, overloading of the RNAi example, Sharma and colleagues identified a drug-tolerant state machinery, and off-target effects). Using RNAi-based perturba- that can be transiently acquired and relinquished through revers- tions followed by comprehensive measurements, Amit et al. ible epigenetic changes that occur at low frequency (Sharma (2009) recently developed a network model of transcriptional et al., 2010b). Therefore, to model drug response in tumors, it regulation in the pathogen-sensing response. Candidate regula- is vital to observe the system at the single cell level and take tors and a reduced signature response were first selected from heterogeneity (stochastic, genetic, and microenvironment) into microarray data of cells stimulated with pathogens. Each candi- account. date was then knocked down with RNAi, and the effect on the A unique and beneficial feature of single cell data is the simul- signature was quantified. This strategy uncovered many new taneous observation of multiple signaling proteins in each indi- factors involved in pathogen sensing and generated an informa- vidual cell. The stochastic variation observed across individual tive network wiring diagram that revealed new crosstalk and cells can be harnessed as a data-rich source for network infer- feedback in these pathways. This strategy and its variations ence, in which each of many thousands of cells can be treated should succeed in reconstructing medium-size molecular as an individual sample (Sachs et al., 2005). This strategy networks in other systems. provides significantly more samples than are available in bulk A second perturbation to consider is small molecules, which assays (e.g., each microarray is only a single sample). often have unique and valuable properties for network modeling Nevertheless, this amount of data comes with a technical and direct relevance to patient care. First, in contrast to RNAi tradeoff. To identify interactions and their function, the partici- kinetics, the instantaneous action of small molecules allows for pating signaling proteins need to be measured simultaneously accurate control of both dose and timing, leading to simpler in the same sample. Typically, single cell measurement technol- interpretations of its effects, without the need to consider ogies are limited to a small number of simultaneous channels network adaptation. Second, small molecules can have specific (approximately four to ten channels for flow cytometry and biochemical effects on proteins, leading to elimination of edges approximately three channels for microscopy), with microscopy in the network, rather than entire nodes as RNAi does. By having the unique advantage of real-time tracking across space comprehensive monitoring of the resulting changes in the and time. A promising emerging technology is mass spectrom- network upon drug perturbation, we can refine network models etry-based single cell cytometry (Ornatsky et al., 2008), which and, importantly, discover how pathway activation, crosstalk, currently can measure up to 35 antibodies in a single cell, with and feedback differ across individual tumors with variable levels the potential scale up to 100. This approach will likely break of drug sensitivity. new ground by enabling the study of midscale networks in indi- Third, variation in the DNA across individuals is a powerful vidual cells. We hope and must rely on clever chemists, engi- resource for studying the effects of perturbation on network neers, and physicists to take on this important challenge of function. It is also effective for detecting regulatory interactions, measuring many molecular states in live, single cells over time uncovering complex phenotypes, and inferring networks (Lee and space. et al., 2006). In contrast to deliberate and somewhat dramatic In the meantime, computational approaches can help bridge disruption of a gene’s function through RNAi or drugs, more the gap by: (1) pointing to a small number of key components subtle effects, such as the attenuation or alteration of function, in a differential network, which would be valuable to analyze can be observed in genetically divergent individuals. Natural at the single cell level, and (2) stitching together small, overlap- variation provides us with numerous genetic alterations in ping subnetworks into larger network models (Sachs et al., various combinations, as selected by evolution to produce func- 2009). But there remains a need to develop methods for inte- tional pathways. By monitoring functional pathways in action, grating genomic data sets at the population level with single we can infer how network components work together under cell measurements over small subsets of components at critical different conditions. Each individual’s genetic variation provides network junctures, leading to a more accurate model of the distinct information linking genotype and phenotype and helps to underlying cellular computations. explain network behavior.
870 Cell 144, March 18, 2011 ª2011 Elsevier Inc. What still needs to be developed is an integrated experi- layers that underlie network function. On the computational mental-computational strategy that combines stimulations and end, the key bottleneck is the development of validated perturbations with functional measurements from the same cells computational methods that integrate heterogeneous data and to build network models. Variation in stimuli and environment build differential-network models on a per tumor basis. These allows us to derive what the network is computing, and perturba- methods are required to: (1) identify the genetic aberrations and tions to its components elucidate how the network is computing. the master regulators that drive proliferation, survival, metas- This suggests expanding the framework set forth by Amit and tasis, and drug resistance; (2) model the adaptive/feedback colleagues (Amit et al., 2009) to additional dimensions, including mechanisms that thwart the efficacy of potent drugs; and (3) a time series of gene expression and proteomic measurements, predict additional target pathways for combinatorial drug treat- following each combination of stimuli and perturbations. Natural ment. Based on these predictions, more data can be collected variation between individuals and tumors combined with tar- to refine the models in iterative rounds of computation and geted perturbations using RNAi or drugs will provide particularly experiments. As three-dimensional models of cancer (Ridky powerful data for deriving tumor network models. et al., 2010) continue to develop, we can also profile multiple Executing the experimental design proposed above requires cell types in a tumor environment and model the interactions technological developments. Much of the dynamics occurs at between these. In short, these studies should teach us what the level of proteins and their modifications, raising the need drives cancers and what part of the networks we should target, for high-throughput proteomics to measure protein abundances both initially and after the network adapts and mutates. and activity states. Importantly, the proposed design requires Many of us believe that the ultimate solutions to minimizing assaying a prohibitively large number of samples. To make cancer reside in the regime of combinatorial patient-specific significant progress in the understanding of molecular networks, drug therapy, immunotherapy, and gene therapy. Accurate there is a critical need for the development of more economical quantitative models of tumor networks should predict the effects multiplex functional assays that can measure thousands of of drug perturbations and thus enable sophisticated rational molecular species per sample at low sample cost. An iterative therapy with optimized dosage, timing, and drug combination approach, in which computational modeling with existing data for each individual tumor. Drug combinations can address guides the selection of the next set of experiments, will provide feedback and network adaptation, ensuring shutdown of the the most cost-effective design (Figure 2). necessary pathways. Additionally, drug combinations can target New experimental technologies are rapidly progressing, with distinct subpopulations within a tumor. computational efforts lagging behind. For example, generating Tumor networks are armed with the ability to adapt and rapidly transcriptome sequence reads is easy, but their assembly evolve and, thus, are a powerful adversary. These need to be met remains challenging. To utilize the enormous potential of the with equally sophisticated and flexible therapy regimes that can data types delineated above, significant advances in computa- track these adaptations and dynamically adapt over time, tional modeling are required. Specifically, there is need for a placing us several moves ahead of the tumor. Studying the emer- transition from static and qualitative models to temporal and gence of drug resistance both in vitro (Johannessen et al., 2010) quantitative models. and in vivo can better inform methods to anticipate potential paths of resistance. The ultimate therapies would involve Future: Personalized Cancer Medicine sending ‘‘networks’’ in vivo to track tumor behavior and control Networks govern fundamental processes, such as the develop- the dosage and timing of drug release in response to tumor ment of a multicellular organism from a single cell and communi- behavior. This long-term goal should become feasible as the cation between immune cells in response to a pathogen. fields of network biology, synthetic biology, and appropriate Fueled by technology and computation, research in the coming drug delivery methods mature. In the immediate future, however, decade is expected to unravel the details and principles behind our goal should be to anticipate and monitor real-time changes in diverse molecular networks and how they compute life’s func- the tumor’s network and adapt our therapies accordingly. tions. For example, the ongoing revolution that has enabled the sequencing of individuals provides the first opportunities to ACKNOWLEDGMENTS systematically study and explain how DNA variation results in our phenotypic diversity. Reaching these goals, however, will The authors would like to thank Arnon Arazi, Andrea Califano, William Hahn, also necessitate a deeper understanding of the biophysical prin- Andreja Jovic, Oren Litvin, Neal Rosen, Sagi Shapira, and Cathy Wu for ciples underlying signal processing in small biological circuits valuable comments. The authors would like to thank Oren Litvin for help with and how these come together in systems of increasing size the illustrations. This research was supported by the NIH Director’s New Inno- vator Award Program through grant numbers DP2-OD002414-01 (D.P.) and and complexity. DP2 OD002230 (N.H.), as well as NIAID U54 AI057159 (N.H.). D.P. holds Within cancer research, systems biology is dramatically a Career Award at the Scientific Interface from the Burroughs Wellcome advancing our mechanistic understanding of tumor progression Fund and Packard Fellowship for Science and Engineering. and the design of personalized therapeutics. Continued success, however, will depend on critical advances in both experimental REFERENCES and computational methods. Improvements in tools for measure- ment—especially mass spectrometry and cost-effective multi- Akavia, U.D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H.C., plex detection—and perturbation—especially RNAi and small Pochanard, P., Mozes, E., Garraway, L.A., and Pe’er, D. (2010). An integrated molecules—will fill in our understanding of the many molecular approach to uncover drivers of cancer. Cell 143, 1005–1017.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 871 Amit, I., Citri, A., Shay, T., Lu, Y., Katz, M., Zhang, F., Tarcic, G., Siwak, D., Inhibition of mutated, activated BRAF in metastatic melanoma. N. Engl. Lahad, J., Jacob-Hirsch, J., et al. (2007). A module of negative feedback regu- J. Med. 363, 809–819. 39 lators defines growth factor signaling. Nat. Genet. , 503–512. Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using Bayesian Amit, I., Garber, M., Chevrier, N., Leite, A.P., Donner, Y., Eisenhaure, T., Gutt- networks to analyze expression data. J. Comput. Biol. 7, 601–620. man, M., Grenier, J.K., Li, W., Zuk, O., et al. (2009). Unbiased reconstruction of Hanahan, D., and Weinberg, R.A. (2011). Hallmarks of cancer: The next gener- a mammalian transcriptional network mediating pathogen responses. Science ation. Cell 144, 646–674. 326, 257–263. Ho¨ lzel, M., Huang, S., Koster, J., Ora, I., Lakeman, A., Caron, H., Nijkamp, W., Bandyopadhyay, S., Chiang, C.Y., Srivastava, J., Gersten, M., White, S., Bell, Xie, J., Callens, T., Asgharzadeh, S., et al. (2010). NF1 is a tumor suppressor in R., Kurschner, C., Martin, C.H., Smoot, M., Sahasrabudhe, S., et al. (2010). neuroblastoma that determines retinoic acid response and disease outcome. A human MAP kinase interactome. Nat. Methods 7, 801–805. Cell 142, 218–229. Basso, K., Margolin, A.A., Stolovitzky, G., Klein, U., Dalla-Favera, R., and Irish, J.M., Hovland, R., Krutzik, P.O., Perez, O.D., Bruserud, O., Gjertsen, B.T., Califano, A. (2005). Reverse engineering of regulatory networks in human and Nolan, G.P. (2004). Single cell profiling of potentiated phospho-protein B cells. Nat. Genet. 37, 382–390. networks in cancer cells. Cell 118, 217–228. Beroukhim, R., Mermel, C.H., Porter, D., Wei, G., Raychaudhuri, S., Donovan, Johannessen, C.M., Boehm, J.S., Kim, S.Y., Thomas, S.R., Wardwell, L., J., Barretina, J., Boehm, J.S., Dobson, J., Urashima, M., et al. (2010). The land- Johnson, L.A., Emery, C.M., Stransky, N., Cogdill, A.P., Barretina, J., et al. scape of somatic copy-number alteration across human cancers. Nature 463, (2010). COT drives resistance to RAF inhibition through MAP kinase pathway 899–905. reactivation. Nature 468, 968–972. Boehm, J.S., Zhao, J.J., Yao, J., Kim, S.Y., Firestein, R., Dunn, I.F., Sjostrom, Land, H., Parada, L.F., and Weinberg, R.A. (1983). Tumorigenic conversion of S.K., Garraway, L.A., Weremowicz, S., Richardson, A.L., et al. (2007). Integra- primary embryo fibroblasts requires at least two cooperating oncogenes. tive genomic approaches identify IKBKE as a breast cancer oncogene. Cell Nature 304, 596–602. 129, 1065–1079. Lee, S.I., Pe’er, D., Dudley, A.M., Church, G.M., and Koller, D. (2006). Identi- Bonneau, R., Facciotti, M.T., Reiss, D.J., Schmid, A.K., Pan, M., Kaur, A., fying regulatory mechanisms using individual variation reveals key role for Thorsson, V., Shannon, P., Johnson, M.H., Bare, J.C., et al. (2007). A predictive chromatin modification. Proc. Natl. Acad. Sci. USA 103, 14062–14067. model for transcriptional control of physiology in a free living cell. Cell 131, 1354–1365. Litvin, O., Causton, H.C., Chen, B.J., and Pe’er, D. (2009). Modularity and inter- actions in the genetics of gene expression. Proc. Natl. Acad. Sci. USA 106, Bos, P.D., Zhang, X.H., Nadal, C., Shu, W., Gomis, R.R., Nguyen, D.X., Minn, 6441–6446. A.J., van de Vijver, M.J., Gerald, W.L., Foekens, J.A., and Massague´ , J. (2009). Genes that mediate breast cancer metastasis to the brain. Nature 459, 1005– Luo, B., Cheung, H.W., Subramanian, A., Sharifnia, T., Okamoto, M., Yang, X., 1009. Hinkle, G., Boehm, J.S., Beroukhim, R., Weir, B.A., et al. (2008). Highly parallel 105 Cancer Genome Atlas Research Network. (2008). Comprehensive genomic identification of essential genes in cancer cells. Proc. Natl. Acad. Sci. USA , characterization defines human glioblastoma genes and core pathways. 20380–20385. Nature 455, 1061–1068. Maher, C.A., Kumar-Sinha, C., Cao, X., Kalyana-Sundaram, S., Han, B., Jing, Carracedo, A., Ma, L., Teruya-Feldstein, J., Rojo, F., Salmena, L., Alimonti, A., X., Sam, L., Barrette, T., Palanisamy, N., and Chinnaiyan, A.M. (2009). Tran- 458 Egia, A., Sasaki, A.T., Thomas, G., Kozma, S.C., et al. (2008). Inhibition of scriptome sequencing to detect gene fusions in cancer. Nature , 97–101. mTORC1 leads to MAPK pathway activation through a PI3K-dependent feed- Mani, K.M., Lefebvre, C., Wang, K., Lim, W.K., Basso, K., Dalla-Favera, R., and back loop in human cancer. J. Clin. Invest. 118, 3065–3074. Califano, A. (2008). A systems biology approach to prediction of oncogenes Carro, M.S., Lim, W.K., Alvarez, M.J., Bollo, R.J., Zhao, X., Snyder, E.Y., and molecular perturbation targets in B-cell lymphomas. Mol. Syst. Biol. 4 Sulman, E.P., Anne, S.L., Doetsch, F., Colman, H., et al. (2010). The transcrip- , 169. tional network for mesenchymal transformation of brain tumours. Nature 463, Ng, S.B., Buckingham, K.J., Lee, C., Bigham, A.W., Tabor, H.K., Dent, K.M., 318–325. Huff, C.D., Shannon, P.T., Jabs, E.W., Nickerson, D.A., et al. (2010). Exome 42 Carter, H., Chen, S., Isik, L., Tyekucheva, S., Velculescu, V.E., Kinzler, K.W., sequencing identifies the cause of a mendelian disorder. Nat. Genet. , Vogelstein, B., and Karchin, R. (2009). Cancer-specific high-throughput anno- 30–35. tation of somatic mutations: computational prediction of driver missense Ngo, V.N., Young, R.M., Schmitz, R., Jhavar, S., Xiao, W., Lim, K.H., Kohlham- mutations. Cancer Res. 69, 6660–6667. mer, H., Xu, W., Yang, Y., Zhao, H., et al. (2010). Oncogenically active MYD88 Chen, Y., Zhu, J., Lum, P.Y., Yang, X., Pinto, S., MacNeil, D.J., Zhang, C., mutations in human lymphoma. Nature 470, 115–119. Lamb, J., Edwards, S., Sieberts, S.K., et al. (2008). Variations in DNA elucidate O’Reilly, K.E., Rojo, F., She, Q.B., Solit, D., Mills, G.B., Smith, D., Lane, H., molecular networks that cause disease. Nature 452, 429–435. Hofmann, F., Hicklin, D.J., Ludwig, D.L., et al. (2006). mTOR inhibition induces Chen, B.J., Causton, H.C., Mancenido, D., Goddard, N.L., Perlstein, E.O., and upstream receptor tyrosine kinase signaling and activates Akt. Cancer Res. Pe’er, D. (2009). Harnessing gene expression to identify the genetic basis of 66, 1500–1508. drug resistance. Mol. Syst. Biol. 5, 310. Ornatsky, O.I., Lou, X., Nitz, M., Scha¨ fer, S., Sheldrick, W.S., Baranov, V.I., Cohen, A.A., Geva-Zatorsky, N., Eden, E., Frenkel-Morgenstern, M., Issaeva, Bandura, D.R., and Tanner, S.D. (2008). Study of cell antigens and intracellular I., Sigal, A., Milo, R., Cohen-Saidon, C., Liron, Y., Kam, Z., et al. (2008). DNA by identification of element-containing labels and metallointercalators Dynamic proteomics of individual cancer cells in response to a drug. Science using inductively coupled plasma mass spectrometry. Anal. Chem. 80, 322, 1511–1516. 2539–2547. Cohen-Saidon, C., Cohen, A.A., Sigal, A., Liron, Y., and Alon, U. (2009). Pe’er, D., Regev, A., Elidan, G., and Friedman, N. (2001). Inferring subnetworks Dynamics and variability of ERK2 response to EGF in individual living cells. from perturbed expression profiles. Bioinformatics 17 (Suppl 1), S215–S224. 36 Mol. Cell , 885–893. Poulikakos, P.I., Zhang, C., Bollag, G., Shokat, K.M., and Rosen, N. (2010). Ebert, B.L., Pretz, J., Bosco, J., Chang, C.Y., Tamayo, P., Galili, N., Raza, A., RAF inhibitors transactivate RAF dimers and ERK signalling in cells with Root, D.E., Attar, E., Ellis, S.R., and Golub, T.R. (2008). Identification of RPS14 wild-type BRAF. Nature 464, 427–430. 451 as a 5q- syndrome gene by RNA interference screen. Nature , 335–339. Ridky, T.W., Chow, J.M., Wong, D.J., and Khavari, P.A. (2010). Invasive three- Flaherty, K.T., Puzanov, I., Kim, K.B., Ribas, A., McArthur, G.A., Sosman, J.A., dimensional organotypic neoplasia from multiple normal human epithelia. Nat. O’Dwyer, P.J., Lee, R.J., Grippo, J.F., Nolop, K., and Chapman, P.B. (2010). Med. 16, 1450–1455.
872 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., and Nolan, G.P. (2005). matin-mediated reversible drug-tolerant state in cancer cell subpopulations. Causal protein-signaling networks derived from multiparameter single-cell Cell 141, 69–80. data. Science 308, 523–529. Stephens, P.J., McBride, D.J., Lin, M.L., Varela, I., Pleasance, E.D., Simpson, Sachs, K., Itani, S., Carlisle, J., Nolan, G.P., Pe’er, D., and Lauffenburger, D.A. J.T., Stebbings, L.A., Leroy, C., Edkins, S., Mudie, L.J., et al. (2009). Complex (2009). Learning signaling network structures with sparsely distributed data. landscapes of somatic rearrangement in human breast cancer genomes. J. Comput. Biol. 16, 201–212. Nature 462, 1005–1010. Tay, S., Hughey, J.J., Lee, T.K., Lipniacki, T., Quake, S.R., and Covert, M.W. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and (2010). Single-cell NF-kappaB dynamics reveal digital activation and analogue Friedman, N. (2003). Module networks: identifying regulatory modules and information processing. Nature 466, 267–271. their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176. Wang, K., Saito, M., Bisikirska, B.C., Alvarez, M.J., Lim, W.K., Rajbhandari, P., Shen, Q., Nemenman, I., Basso, K., Margolin, A.A., et al. (2009). Genome-wide Shapira, S.D., Gat-Viks, I., Shum, B.O., Dricot, A., de Grace, M.M., Wu, L., identification of post-translational modulators of transcription factor activity in Gupta, P.B., Hao, T., Silver, S.J., Root, D.E., et al. (2009). A physical and regu- human B cells. Nat. Biotechnol. 27, 829–839. latory map of host-influenza interactions reveals pathways in H1N1 infection. Weir, B.A., Woo, M.S., Getz, G., Perner, S., Ding, L., Beroukhim, R., Lin, W.M., Cell 139, 1255–1267. Province, M.A., Kraja, A., Johnson, L.A., et al. (2007). Characterizing the Sharma, S.V., Haber, D.A., and Settleman, J. (2010a). Cell line-based plat- cancer genome in lung adenocarcinoma. Nature 450, 893–898. forms to evaluate the therapeutic efficacy of candidate anticancer agents. Whitehurst, A.W., Bodemann, B.O., Cardenas, J., Ferguson, D., Girard, L., Nat. Rev. Cancer 10, 241–253. Peyton, M., Minna, J.D., Michnoff, C., Hao, W., Roth, M.G., et al. (2007). Sharma, S.V., Lee, D.Y., Li, B., Quinlan, M.P., Takahashi, F., Maheswaran, S., Synthetic lethal screen identification of chemosensitizer loci in cancer cells. McDermott, U., Azizian, N., Zou, L., Fischbach, M.A., et al. (2010b). A chro- Nature 446, 815–819.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 873 Leading Edge Primer
Modeling the Cell Cycle: Why Do Certain Circuits Oscillate?
James E. Ferrell, Jr.,1,2,* Tony Yu-Chen Tsai,1,2 and Qiong Yang1 1Department of Chemical and Systems Biology 2Department of Biochemistry Stanford University School of Medicine, Stanford, CA 94305-5174, USA *Correspondence: [email protected] DOI 10.1016/j.cell.2011.03.006
Computational modeling and the theory of nonlinear dynamical systems allow one to not simply describe the events of the cell cycle, but also to understand why these events occur, just as the theory of gravitation allows one to understand why cannonballs fly in parabolic arcs. The simplest examples of the eukaryotic cell cycle operate like autonomous oscillators. Here, we present the basic theory of oscillatory biochemical circuits in the context of the Xenopus embryonic cell cycle. We examine Boolean models, delay differential equation models, and especially ordinary differen- tial equation (ODE) models. For ODE models, we explore what it takes to get oscillations out of two simple types of circuits (negative feedback loops and coupled positive and negative feedback loops). Finally, we review the procedures of linear stability analysis, which allow one to determine whether a given ODE model and a particular set of kinetic parameters will produce oscillations.
In many eukaryotic cells, the cell cycle proceeds as a sequence and S. cerevisiae. This requires the identification of the proteins of contingent events. A new cell must first grow to a sufficient and genes needed for the embryonic cell cycle and the elucida- size before it can begin DNA replication. Then, the cell must tion of the regulatory processes that connect these proteins and complete DNA replication before it can begin mitosis. Finally, genes. Over the past three decades, enormous progress has the cell must successfully organize a metaphase spindle before been made toward these ends. In each case, the cell cycle is it can complete mitosis and begin the cycle again. If cell growth, driven by a protein circuit centered on the cyclin-dependent DNA replication, or spindle assembly is slowed down, the entire protein kinase CDK1 and the anaphase-promoting complex cell cycle slows. Thus, this type of cell cycle is like an ‘‘assembly (APC) (Figure 1A). The activation of CDK1 drives the cell into line’’ or ‘‘succession of dominoes’’ (Hartwell and Weinert, 1989; mitosis, whereas the activation of APC, which generally lags Murray and Kirschner, 1989b). behind CDK1, drives the cell back out (Figure 1B). There are still However, some cell cycles are qualitatively different in terms some missing components and poorly understood connections, of their dynamics. Most notable of these exceptions is the early but overall, the cell-cycle network is fairly well mapped out. embryonic cell cycle in the amphibian Xenopus laevis. DNA repli- But a satisfying understanding of why the CDK1/APC system cation is not contingent upon cell growth, probably because the oscillates requires more than a description of components and frog egg is so big to start with. Mitotic entry is not contingent connections; it requires an understanding of why any regulatory upon completion of DNA replication, and mitotic exit is not circuit would oscillate instead of simply settling down into contingent upon the successful assembly of a metaphase a stable steady state. What types of biochemical circuits can spindle because the relevant checkpoints are ineffective in the oscillate, and what is required of the individual components of context of the embryo’s high cytoplasm:nucleus ratio (Dasso the circuit to permit oscillations? Such insights are provided by and Newport, 1990; Minshull et al., 1994). Lacking these contin- the theory of nonlinear dynamics and by computational gencies, the early embryo simply pulses once every 25 min, modeling. irrespective of whether the endpoints of the cell cycle (DNA Indeed, cell-cycle modeling has become a very popular replication and mitosis) have been completed (Hara et al., pursuit. Hundreds of models have been published (Table 1), 1980). Thus, this cell cycle is clock-like (Murray and Kirschner, beginning with Kauffman, Wille, and Tyson’s prescient proposal 1989b); it behaves as if it is being driven by an autonomous that the cell cycle of the yellow slime mold Physarum biochemical oscillator. polycephalum is driven by a relaxation oscillator (Kauffman and Although many biological processes seem almost unfathom- Wille, 1975; Tyson and Kauffman, 1975). Many of the early ably complex and incomprehensible, oscillators and clocks are models, and a few of the more recent models, were simple, as the types of processes that we might have a good chance of models in physics typically are. They consisted of a small not just describing, but also understanding. Accordingly, much number of ordinary differential equations relating a few time- effort has gone into understanding how simple cell cycles work dependent variables (e.g., protein concentrations or activities) in model systems like Xenopus embryos and the fungi S. pombe to each other and to a few time-independent kinetic parameters.
874 Cell 144, March 18, 2011 ª2011 Elsevier Inc. ODE models, which translate this logic into chemical terms. The basic methods for analyzing ODE models of oscillators are well known in the field of nonlinear dynamics but are not so well known among biologists. We believe that it is high time that they were; after all, we biologists are studying what are prob- ably the world’s most interesting nonlinear dynamical systems. We will emphasize the basic concepts of oscillator function and, to the extent possible, keep the algebra to a minimum. For further information, the reader is directed to lucid reviews by Goldbeter (Goldbeter, 2002) and Nova´ k and Tyson (Nova´ k and Tyson, 2008; Tyson et al., 2003), as well as Strogatz’s outstanding textbook (Strogatz, 1994).
Boolean Models We begin by paring the cell cycle down to a simple two-compo- nent model in which CDK1 activates APC and APC inactivates CDK1 (Figure 2B). This is the essential negative feedback loop upon which the cell-cycle oscillator is built (Murray et al., 1989). Perhaps the simplest way to think about the dynamics of a system like this is through Boolean or logical analysis (Glass and Kauffman, 1973). Suppose that both CDK1 and APC are perfectly switch-like in their regulation; that is, they are either completely on or completely off. Then, together, the system of CDK1 plus APC
has four possible discrete states (APCon/CDK1on, APCon/ CDK1 , APC /CDK1 , and APC /CDK1 )(Figure 2E). Now Figure 1. Simplified Depiction of the Embryonic Cell Cycle, High- off off on off off lighting the Main Regulatory Loops suppose the system starts in an interphase-like state, with (A) Cyclin-CDK1 is the master regulator of mitosis. APC-Cdc20 is an E3 APCoff/CDK1off. In the first increment of time, what will happen? ubiquityl ligase, which marks mitotic cyclins for degradation by the protea- If the APC is off, then CDK1 turns on. Thus, we define a rule: some. Wee1 is a protein kinase that inactivates cyclin-CDK1. Cdc25 is state 1, with APC /CDK1 , goes to state 2 with APC /CDK1 . a phosphoprotein phosphatase that activates cyclin-CDK1. Not shown here is off off off on Plk1, which cooperates with cyclin-CDK1 in the activation of APC-Cdc20. Next, the active CDK1 activates APC; thus, state 2 goes to 3. The (B) In the Xenopus embryo, the activation of CDK1 drives the cell into mitosis, active APC then inactivates CDK1, and state 3 goes to state 4. whereas the activation of APC, which generally lags behind CDK1, drives the Finally, in the absence of active CDK1, the APC becomes inac- cell back out of mitosis. tive, and state 4 goes to state 1. This completes the cycle. We can depict the dynamics of this oscillator as a diagram in ‘‘state space’’ (Figure 2E). The model goes through a never- ending cycle, and all of the possible states of the system are The purpose of this type of modeling is to understand in simpler, visited during each run through the cycle. albeit more abstract, terms how and why the cell cycle works. If we add one more component to the system—for example, Through time, many of the models have become more compli- a protein like Polo-like kinase 1 (Plk1), which here we assume cated and more like chemical engineering models, consisting of is activated by CDK1 and, in turn, contributes to the activation dozens of variables and regulatory processes. The purpose of of APC (Figure 2C)—then there are eight (2 3 2 3 2) possible this type of modeling is to account for and test our understanding states for the system. If we start with all of the proteins off and of specific details of the system that, because of the complexity of assume six biologically reasonable rules (active CDK1 activates the system, cannot always be understood through intuition. This Plk1, active Plk1 activates APC, active APC inactivates CDK1.), type of detailed model has successfully accounted for the pheno- once again we get a never-ending cycle of states (Figure 2F). But types of dozens of budding yeast mutants (Chen et al., 2004). this time, only some of the possible states (states 1–6 in Both types of modeling have their place in understanding cell- Figure 2F) lie on the cycle. The other two states (7 and 8) feed cycle regulation, and both have their adherents. Modeling into the cycle in a manner determined by the rules we assume. approaches range from simple Boolean modeling to stochastic Thus, no matter where the system starts, it will converge to the modeling and partial differential equation modeling. However, cycle sooner or later. The behavior of this Boolean model is to date, the majority of effort has focused on ordinary differential analogous to ‘‘limit cycle oscillations,’’ which we will encounter equation (ODE) modeling (Table 1), which gets at the basic again in the next section. solution phase biochemistry of cell-cycle regulation. With Boolean models, it is easy to obtain oscillations. Indeed, Here, we address the question of what it takes to make one can even get oscillations from a model with a single species a simple protein circuit like the CDK1/APC system oscillate. (CDK1) that flips on when it is off and flips off when it is on We will start with Boolean modeling, which provides intuition (Figures 2A and 2D), a discrete representation of a protein that into the logic of biochemical oscillators. We then move on to negatively regulates itself.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 875 Table 1. Some Mathematical Models of the Eukaryotic Cell Cycle Year Organism/Cell Type Type of Model Reference 1970 No specific organism ODE (Sel’kov, 1970) 1974 No specific organism ODE (Gilbert, 1974) 1975 Physarum polycephalum ODE (Kauffman and Wille, 1975) 1975 Physarum polycephalum ODE (Tyson and Kauffman, 1975) 1991 Xenopus laevis embryos ODE (Goldbeter, 1991) 1991 Xenopus embryos ODE (Norel and Agur, 1991) 1991 Xenopus embryos, somatic cells ODE (Tyson, 1991) 1992 Xenopus embryos ODE (Obeyesekere et al., 1992) 1993 Xenopus embryos ODE (Novak and Tyson, 1993a) 1993 Xenopus embryos ODE (Novak and Tyson, 1993b) 1994 Xenopus embryos ODE, delay differential equations (Busenberg and Tang, 1994) 1996 Xenopus embryos ODE (Goldbeter and Guilmot, 1996) 1997 S. pombe ODE (Novak and Tyson, 1997) 1998 S. pombe ODE (Novak et al., 1998) 1998 Xenopus embryos ODE (Borisuk and Tyson, 1998) 1999 Mammalian somatic cells ODE (Aguda and Tang, 1999) 2003 Xenopus embryos ODE (Pomerening et al., 2003) 2003 S. cerevisiae ODE (Ciliberto et al., 2003) 2004 S. cerevisiae ODE (Chen et al., 2004) 2004 S. cerevisiae Boolean (Li et al., 2004) 2004 S. pombe Stochastic (Steuer, 2004) 2005 Xenopus embryos ODE (Pomerening et al., 2005) 2006 Mammalian somatic cells Delay differential equations (Srividhya and Gopinathan, 2006) 2006 S. cerevisiae Stochastic (Zhang et al., 2006) 2007 S. cerevisiae Stochastic (Braunewell and Bornholdt, 2007) 2007 S. cerevisiae Stochastic (Okabe and Sasai, 2007) 2007 S. cerevisiae Hybrid (Barberis et al., 2007) 2008 Xenopus embryos ODE (Tsai et al., 2008) 2008 S. cerevisiae Stochastic (Ge et al., 2008) 2008 S. cerevisiae Stochastic (Mura and Csika´ sz-Nagy, 2008) 2008 S. pombe Boolean (Davidich and Bornholdt, 2008) 2008 Mammalian somatic cells ODE (Yao et al., 2008) 2009 Mammalian somatic cells ODE (Alfieri et al., 2009) 2010 S. cerevisiae ODE (Charvin et al., 2010) 2010 S. cerevisiae, S. pombe Boolean (Mangla et al., 2010) 2010 S. pombe ODE (Li et al., 2010)
ODE Models of the CDK1/APC System CDK1 and APC molecules are large, the activation and inactiva- Although Boolean analysis is simple and appealing, it is not tion of CDK1 and APC can be described by a set of differential completely realistic. First, all three Boolean models with negative equations. Here, we will build up an ODE model of the system, feedback loops (Figures 2A–2C) yielded oscillations even though starting with a one-ODE model, which fails to produce we know that real negative feedback loops do not always oscil- oscillations. We then add additional complexity to the ODEs late. The problem is the simplifying assumptions that underpin until the model succeeds in producing sustained, limit cycle Boolean analysis: the discrete activity states and time steps. oscillations. Even if individual CDK1 and APC molecules actually flip between discrete on/off states, a cell contains a number of CDK1 and A One-ODE Model APC molecules, and they would not be expected to all flip By definition, the rate of change of active CDK1 (denoted CDK1*) simultaneously. is the rate of CDK1 activation minus the rate of CDK1 inactiva- The framework for describing the dynamics of such a system tion. For simplicity, we will assume that CDK1 is activated is chemical kinetic theory, and, assuming that the numbers of by the rapid, high-affinity binding of cyclin, which is being
876 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 2. Boolean Models of CDK1 Regulation (A–C) Schematic representation of negative feedback loops composed of one (A), two (B), or three (C) species. (D–F) Trajectories in state space for Boolean models of these three negative feedback systems. Solid lines represent limit cycles; dashed lines (in F) connect the states off the limit cycle to the limit cycle.
synthesized at a constant rate of a1 (Equation 1, blue). For CDK1 step process; multistep processes often yield ultrasensitive, inactivation, we will assume mass action kinetics (Equation 1, sigmoidal responses; and, for our purposes, the Hill equation pink). with a Hill coefficient (n) greater than 1 can be thought of as This gives us the first-order differential equation: a generic sigmoidal function. Substituting a Hill function for APC* in Equation 1, we get a one-ODE model of a negative feed- back loop: [Equation 1] dCDK1 CDK1 n1 = a b CDK1 [Equation 2] dt 1 1 Kn1 + CDK n1 1 1 There are two time-dependent variables, CDK1* and APC*. To allow the system to be described by an ODE with a single We now choose, somewhat arbitrarily, values for the model’s time-dependent variable (Figure 3A), we assume that the activity parameters (a1 = 0.1, b1 =1,K1 = 0.5, n1 = 8) and initial conditions of APC is regulated rapidly enough by CDK1* so that it can be (CDK1*[0] = 0). We can then numerically integrate Equation 2 considered an instantaneous function of CDK1*. What functional over time and see how the concentration of activated CDK1* form should we use for APC’s response function? Here, we will evolves. assume that APC’s response to CDK1* is ultrasensitive— As shown in Figure 3C, the system moves monotonically from sigmoidal in shape, like the response of a cooperative its initial state toward a steady state; there is no hint of oscillation. enzyme—and that the response is described by a Hill function. This monotonic approach to steady state is observed no matter This assumption is reasonable because APC activation is a multi- what we assume for the parameters and initial conditions. Thus,
Figure 3. A Model of CDK1 Regulation with One Differential Equation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, b1 =1,K1 = 0.5, and n1 =8. (B) Trajectories in one-dimensional phase space, approaching a stable steady state (designated by the filled circle) at CDK1*z0.43. (C) Time course of the system, starting with CDK1*[0] = 0 and evolving toward the steady state.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 877 Figure 4. A Two-ODE Model of CDK1 and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, a2 =3,b1 =3, b2 =1,K1 = 0.5, K2 = 0.5, n1 = 8, and n2 =8. (B) Phase space depiction of the system. The red and green curves are the two nullclines of the system, which can be thought of as the steady- state response curves for the two individual legs of the feedback loop. The filled black circle at the intersection of the nullclines (with CDK1*z0.42 and APC*z0.37) represents a stable steady state. One trajectory is shown, starting at CDK1*[0] = 0, APC[0] = 0, and spiraling in toward the stable steady state. (C) Time course of the system, showing damped oscillations approaching the steady state.
we have not yet built an oscillator model. Even though we were Now for APC (Equation 4), we assume that its rate of its activa- able to produce sustained oscillations with a one-variable tion by CDK1* is proportional to the concentration of inactive Boolean model of a negative feedback loop (Figures 2A and APC (which, assuming the total concentration of active and inac- 2D), translating the model into a differential equation eliminated tive APC to be constant, we take to be 1 APC ) times a Hill the oscillations. function of CDK1*, and the rate of inactivation of APC* is Another way of representing the system’s behavior is through described by simple mass action kinetics. The resulting two- a phase plot, which shows all possible activities of the system. ODE model is: This is similar to the state-space plots that we used for the dCDK1 APC n1 Boolean analysis, but instead of having a few discrete states, = a b CDK1 [Equation 3] dt 1 1 Kn1 + APC n1 the phase plot displays a continuum, showing how the system’s 1 1 transition between states occurs through a smooth continuum (as we would expect, given that the numerous CDK1 molecules dAPC CDK1 n2 = a ð1 APC Þ b APC [Equation 4] dt 2 Kn2 + CDK n2 2 do not all activate simultaneously but ‘‘smoothly’’ turn on.). 2 1 The phase plot contains one dimension for each time-depen- dent variable. Therefore, in this one-variable model, the phase Again, we choose kinetic parameters and initial condition (as plot possesses one axis, representing the concentration of acti- described in the caption to Figure 4) and integrate the ODEs vated CDK1* (Figure 3B). In addition, the system’s phase plot numerically. The results are shown in Figures 4B and 4C. The shows one stable steady state with CDK1 z0:43. If the system CDK1 activity initially rises as the system moves from interphase starts off with CDK1 activity less or greater than 0.43, the system (low CDK1 activity) toward M phase (high CDK1 activity) will move along a trajectory back to 0.43. In other words, any (Figure 4C). After a lag, the APC activity begins to rise too. initial condition to the left or right of the steady state yields Then, the rate of CDK1 inactivation (driven by APC activation) a trajectory moving to the right or left, respectively. exceeds the rate of CDK1 activation (driven by cyclin synthesis), and the CDK1 activity starts to fall. After a few wiggles up and down, the system approaches a steady state with intermediate A Two-ODE Model levels of both CDK1 and APC activities. Thus, we have generated Why did the one variable Boolean model produce oscillations damped oscillations, but not sustained oscillations. (Figures 2A and 2D), whereas the one-ODE model (Equation 2) Figure 4B shows the phase space view of these damped oscil- did not (Figure 3)? The discrete time steps of the Boolean model lations. The phase space is now two dimensional because there help to segregate CDK1 activation from inactivation in time. are two time-dependent variables. There is a stable steady state Thus, perhaps adding another ODE (Figure 4A), which acknowl- that sits at the intersection of two curves called the nullclines edges the fact that APC regulation is not instantaneous, might (green and red curves, Figure 4B). These two nullclines can be allow us to generate oscillations. thought of as stimulus-response curves for the two individual First, we write an ODE for the activation and inactivation of legs of the CDK1/APC system. The red nullcline (defined by CDK1 (Equation 3). We once again assume that CDK1 is acti- the equation dCDK1 =dt = 0) represents what the steady-state vated by a constant rate of cyclin synthesis (a1). We assume response of CDK1* to constant levels of APC activity would be that the multistep process through which APC* inactivates if there were no feedback from CDK1* to APC* (Figure 4B). The CDK1* is described by a Hill function. The inactivation rate is green nullcline (defined by dAPC =dt = 0) represents what the therefore proportional to the concentration of CDK1* (the steady-state response of APC* to CDK1* would be if there substrate being inactivated) times a Hill function of APC*. were no feedback from APC* to CDK1* (Figure 4B). For the whole
878 Cell 144, March 18, 2011 ª2011 Elsevier Inc. Figure 5. A Three-ODE Model of CDK1, Plk1, and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, a2 =3,a3 =3, b1 =3,b2 =1,b3 =1,K1 = 0.5, K2 = 0.5, K3 = 0.5, n1 =8,n2 = 8, and n3 =8. (B) Phase space depiction of the system. The two colored surfaces are two of the three null surfaces of the system. For clarity, we have omitted the third. The open circle at the intersection of the null surfaces (with CDK1*z0.43, Plk1*z0.42, and APC*z0.37) represents an unstable steady state (or unstable spiral). One trajectory is shown, starting at CDK1*[0] = 0, Plk1[0] = 0, APC[0] = 0, and spiraling in toward the limit cycle. (C) Time course of the system, showing sustained limit cycle oscillations. system to be in steady state, both time derivatives must be zero. help to generate a time delay in the negative feedback, which Thus, the steady state for the entire system lies where the two helps to keep the system from settling into a stable steady state. nullclines intersect. The steady state is stable, and the trajectory of the system (black curve) spirals in from the initial values of Linear Stability Analysis CDK1* and APC* toward the stable steady state (Figure 4B). So far, we have confined ourselves to analyzing ODE models through simulations. This provides an intuitive feel for the A Three-ODE model behavior of a system, but of course, it is never possible to choose Perhaps we can improve the oscillations by adding a third all possible values for the kinetic parameters or all possible initial species to the model, which increases the lag between CDK1 conditions. Is there a way to explain theoretically, rather than activation and APC activation (Figure 4C). Here, we will add computationally, why the one-ODE model failed to oscillate at Plk1 back into the model, as we did in the three-component all, the two-ODE model at best yielded damped oscillations, Boolean model (Figure 2C), with Plk1 assumed to act as an inter- and the three-ODE model finally yielded sustained oscillations? mediary between CDK1 and APC. We now have three ODEs The answer is yes, and probably the most straightforward (Equations 5–7). The equation for the activation and inactivation approach is ‘‘linear stability analysis.’’ Linear stability analysis is of CDK1 stays the same (Equation 5). The activation of Plk1 by quite remarkable. It assesses the stability of the steady states of CDK1* is proportional to the concentration of inactive Plk1 (1 the system, and, almost magically, allows the dynamics of the Plk1*) times a Hill function of CDK1*, and the inactivation is system tobe characterizedeven when the systemis farfrom steady proportional to Plk1* (Equation 6). A similar logic for the activa- state. To get started with linear stability analysis, we will analyze the tion and inactivation of APC gives Equation 7. steady state of the one-ODE model described in Equation 2. dCDK1 APC n1 = a b CDK1 [Equation 5] Linear Stability Analysis of the One-ODE Model dt 1 1 Kn1 + APC n1 1 For notational simplicity, we will refer to the rate of change of CDK1 (dCDK1 =dt)asf. This function f can be thought of as dPlk1 CDK1 n2 = a ð1 Plk1 Þ b Plk1 [Equation 6] a function of CDK1*, which in turn is a function of time. In terms dt 2 Kn2 + CDK n2 2 2 1 of f, Equation 2 becomes: