TH E ADAPTIVE LANDSCAPE in Evolutionary Biology CHAPTER17 :h and the liutionary

)id forest High-Dimensional Adaptive ed butter­ osophicai Landscapes Facilitate Evolutionary Sciences, Innovation hoiogicai Andreas Wagner f Popuia­ iutionary 'go,IL.

17.1 Introduction one peak to the next, one needs to cross a mal­ adaptive valley-the valley of death, if you will­ The Adaptive Landscape is one of the most influen­ with no detour around this valley. This changes in tial concepts in evolutionary biology (Wright 1932, higher dimensional landscapes, where, counterin­ Futuyma 1998, see also Chapter 2 of this volume). It tuitively, "extra dimensional bypasses" around mal­ is commonly visualized as a surface of rolling hills adaptive valleys exist (Conrad 1990). Through such or rugged mountains in a three-dimensional space. bypasses, one can h'avel from peak to peak while Two of the three dimensions represent allele fre­ avoiding valleys of death. Gavrilets pointed out quencies or-relevant for my purpose-genotypes. that this principle has implications for the evolu­ The third dimension represents fitness. The land­ tion of reproductive isolation (Gavrilets 1997, 2005). scape's peaks represent adaptive trait combinations I here explain that it also has implications for how or genotypes. The Adaptive Landscape concept biological systems llU1ovate. has been highly successful, so much so that it has hU1ovations in evolving biological systems are spawned multiple variants in the h~nds of differ­ qualitatively new and beneficial new . ent authors, including holey landscapes (Gavrilets High-dimensional Adaptive Landscapes can facil­ 1997) and landscapes. The variant I itate such llU1ovations. In the next section I dis­ emphasize here can be viewed as a phenotype land­ cuss evidence for this assertion for three classes scape (see also Chapter 18 of this volume for differ­ of systems important for evolutionary llU1ovation. ent kinds of phenotype landscapes). These are metabolic networks, gene regulation cir­ I view the Adaptive Landscape as a metaphor cuits, as well as protein and RNA macromolecules. for the evolutionary process. It is an abstraction Detailed studies of the high-dimensional genotype derived from an immensely complex reality. Such spaces of these systems have demonstrated the abstraction is necessary for all human understand­ existence of vast genotype networks. These are ing of the world around us. Yet like all abstractions, connected sets of genotypes with the same phe­ it also has limitations. Several of these limitations notype that extend far through genotype space. are caused by the high dimensionality of geno­ Genotype networks are important for the abil­ type spaces. This high dimensionality has already ity of biological systems to explore many novel been appreciated by Sewall Wright, the creator phenotypes. Genotype networks can be viewed of the Adaptive Landscape concept (Wright 1932). as a consequence of the high-dimensionality of Its consequences have been shldied at least since genotype spaces. I will begin illustrating this fea­ the late 1980s (Kauffman and Levin 1987; Con­ ture in some detail with metabolic networks, and rad 1990; Gavrilets 1997). One important such con­ then discuss the other system classes more briefly. sequence regards a basic geometric intuition we Table 17.1 summarizes some of the concepts I will derive from three-dimensional space. To get from introduce.

271 272 PART V DEVELOPMENT, FORM, AND FUNCTION

Table 17.1 Important concepts for the three system classes discussed in this chapter.

Genotype Neighbors Genotype space Phenotype

Metabolic Network DNA encoding all metabolic Networks that differ in All possible metabolic Ability to synthesize biomass or enzymes/reactions in a one enzyme (reaction) networks other important molecules from a metabolism given spectrum of nutrients Regulatory circuit DNA encoding regulatory Circuits that differ in one All possible regul atory Gene expression or molecular interactions among molecules regulatory interaction circuits activity of all ci rcuit molecules Macromolecule Amino acid or nucleotide Polymers that differ in a All possible amino Tertiary structure (fold) and (protein/RNA) polymers single monomer acid/nucleotide polymers biochemical activity

17.2 Metabolic network space possible metabolic networks and their biosynthetic Metabolic networks are comprised of hundreds abilities. To do so, it is necessary to define a space , to thousands of chemical reactions-catalyzed by of possible metabolic genotypes. The metabolic r enzymes that are encoded by genes-which synthe­ genotype of anyone organism is the part of (. size all small molecules in biomass from environ­ the organism's genome that encodes the enzymes mental nutrients. In addition, they produce energy which catalyze metabolic reactions. Although this 1 and many important secondary metabolites. Such genotype is a string of DNA, it is useful to represent t networks are involved in many innovations, from it in a more compact way as follows. Consider the microbes to higher organisms. Examples include known "universe" of enzyme-catalyzed chemical A the ability of microbes to grow on synthetic antibi­ reactions. This muverse currently comprises more y' otics or other toxic xenobiotic compounds, such . than 5000 reactions and the associated enzymes. II as polychlorinated biphenyls, chlorobenzenes, or Write the names of these reactions or their stoicluo­ Fe pentachlorophenol, in some cases using them as metric equations as a list. For anyone organism, 2( their sole carbon source (Cline et al. 1989; van write a one next to the reaction in this list if its 111 der Meer 1995; van der Meer et al. 1998; Copley genome encodes an enzyme catalyzing this reac­ 2000; Dantas et al. 2008; Rehmann and Daugulis tion, and a zero if it does not. The result will be a hE 2008). They also include the urea cycle, a metabolic long string of ones and zeroes that can represent the all innovation of land-living animals that allows them metabolic genotype of the organism (Rodrigues and nl: to convert toxic ammonia into urea for excretion. Wagner 2009). ad Novel metabolic traits often involve new combina­ In tlus representation, a metabolic genotype can po tions of chemical reactions (enzymes) in an organ­ be viewed as a single point in a vast metabolic CaJ ism that already exist separately elsewhere. For genotype space, a space of possible metabolic net­ thE example, a novel metabolic pathway to degrade works. This space contains all possible pres­ an. pentachlorophenol involves four steps that its host ence/absence combinations of enzyme-catalyzed tat organism assembled- probably through horizontal reactions. Each such combination constitutes a are gene transfer- from enzymes processing naturally metabolic network. For a universe of 5000 reactions, ChE occurring chlorinated chemicals, as well as from there are 25000 such metabolic networks, a hyper­ wh an enzyme involved in tyrosine metabolism (Cop­ astronomical number, many more than could be frQ] ley 2000). Similarly, the urea cycle arose when four realized in organisms on earth. Metabolic genotype can widespread enzymatic reactions involved in argi­ space can also be viewed as the set of all binary gell nine biosynthesis combined with arginase, a reac­ strings of length 5000. Yet anotl1er, geometric rep­ T tion involved in arginine degradation (Takiguchi resentation is that of a 5000-dimensional hypercube qua et al. 1989). graph. Tlus is a graph whose vertices-the vertices Thi.: To study innovation in metabolic networks sys­ of an n-dimensional cube-are the individual geno­ ablE tematically, one needs to be able to represent all types. Two genotypes are connected by an edge if pan HIGH-DIMENS IONAL ADAPTIVE LANDSCAP ES FAC ILITATE EVOLUTIONARY INNOVATION 27 3

they are (I-mutant) neighbors, that is, if they differ metabolic phenotype. For example, one could list in a single chemical reaction. the essential biomass molecules that a metabolic This genotype space is a prototypical exam­ network is able to synthesize iI1 anyone chemical mthesize biomass or ple of a high-dimensional space. Each of its axes environment. However, unless a metabolic network nant molecules from a (reactions) corresponds to one dimension. In con­ can synthesize all essential molecules, it may not rum of nutrients h'ast to our low-dimensional Euclidean space, it be able to sustain life. This definition is therefore ssion or molecular is a discrete and not a continuous space, con­ of liInited use. To study metabolic umovations, and " circuit molecules taining an enumerable number of elements. It is especially umovations that allow survival on novel Icture (fold) and thus quite different from Euclidean space. How­ I activity sources of energy and carbon, the followu1g defini­ ever, it also shares some similarities with this space. tion is more useful. This definition focuses on car­ The most important of them is that in genotype bon, because carbon is a central element u1life, and space intuitive measures of the distance between because umovations involving carbon metabolism two metabolic genotypes exists. One such mea­ are thus especially important. Howevel~ what I ir biosynthetic sure is simply the fraction of metabolic reactions in describe also applies to other metabolic iImovations define a space which two genotypes differ. In mathematical terms, (Rodrigues and Wagner 2011). ['he metabolic metabolic genotype space is thus a metric space Consider a minimal chemical environment that : the part of (Searcoid 2007). contau1S a small number of molecules which can ; the enzymes serve as a source of all necessary elements except Although this carbon. For u1stance, U1 the case of E. coli, this 17.3 From metabolic genotype ~ to represent environment comprises only six different kinds to phenotype Consider the of molecules. Now make a list of many different ·zed chemical A free-living organism such as Escherichia coli or potential sources of carbon and energy, such as glu­ mprises more yeast needs to synthesize some 50 essential biomass cose, ethanol, glycerol, and so forth. For the sake ted enzymes. molecules to grow and divide (Forster et al. 2003; of the argument, let us consider 100 such sources. their stoichio­ Feist et al. 2007). These molecules include all For each of these sources, when provided in an ne organism, 20 proteinaceous amino acids, RNA and DNA otherwise minimal environment as the sale carbon this lis t if its nucleotides, lipids, and enzyme co-factors. source, determu1e whether a given metabolic net­ ng this reac­ To sustain life, the metabolic network of a work can synthesize all essential biomass metabo­ sult will be a heterotrophic organism needs to generate energy lites. If so, write a one next to the list of carbon represent the and synthesize all these molecules from a limited sources. If not, write a zero. Define the resultu1g odrigtles and number of chemicals in the environment. Recent string as the metabolic phenotype of this metabolic advances in computational methods have made it network. It represents the set of carbon sources on senotype can possible to compute whether a metabolic network which the metabolic network can sustain life, on 1St metabolic can do so, including the rate at which it can syn­ which it is viable (Rodrigues and Wagner 2009). .etabolic net­ thesize each compound (Schilling et al. 1999; Feist This definition of a metabolic phenotype is well­ )ssible pres­ and Palsson 2008; Feist et al. 2009). For this compu­ suited for a systematic analysis and comparison of ne-catalyzed tation, one needs two kinds of information. These phenotypes, includu1g metabolic innovations. First, :onstitutes a are the stoichiometric equations for each of the it encapsulates an astronomical number of different 100 reactions, chemical reactions in the network, and the rate at phenotypes (2100 for 100 carbon sources). Second, ks, a hyper­ which an organism can import necessary chemicals this notion of phenotype makes it easy to compare an could be from the environment. Given this approach, one different phenotypes by comparu1g their associated genotype can compute metabolic phenotypes from metabolic bU1ary stru1gs. Third, an evolutionary innovation­ all binary genotypes. viability on new carbon sources-simply corre­ To study metabolic iImovation is to study how sponds to a phenotype string where one or more qualitatively novel metabolic phenotypes arise. zeroes are converted to ones. It fits the definition This requires a definition of phenotype that is suit­ of an iImovation as a new phenotype that can geno­ able for a systematic analysis, and suitable to com­ make a qualitative difference to survival U1 the an edge if pare phenotypes. There are many ways to define a right environments, namely environments where 274 PART V DEVELOPMENT, FORM, AND FUNCTION this carbon source is the only available source. Such Second, genotypes with the same phenotype a novel phenotype can arise by adding reactions form vast cOlmected genotype networks that reach to a metabolic network, for example by adding far through genotype space. This means that one enzyme-coding genes to a genome via horizontal can step from one genotype to its neighbor, to gene h·ansfer. the neighbor's neighbor, and so on, without ever changing a phenotype. A genotype network can be viewed as a network of metabolic networks in genotype space. Two genotypes that are far apart 17.4 Exploring metabolic genotype on this network have the same phenotype but may space share fewer than 25% of their chemical reactions To characterize metabolic genotype space exhaus­ (Rodrigues and Wagner 2009). tively is impossible, but one can obtain much Third, the neighborhood of any two genotypes on insight into the organization of this space by the same genotype network contains very different carefully designed random sampling. One rele­ novel phenotypes (Fig. 17.1). Together, these prop­ vant class of approach is Markov chain Monte erties facilitate the evolution of novel phenotypes Carlo sampling, which involves random walks through exploration of genotype space. They allow through genotype space (Rodrigues and Wagner a population to keep its phenotype unchanged 2009; Sarna I et al. 2010). Such random walks mod­ while exploring different regions of genotype space ify a starting network in a series of steps, each of and many novel phenotypes therein. which either eliminates a reaction (such as might Genotype networks are not peculiarities ' of car­ occur through a loss-of-ftmction in an bon metabolism. They also occur in the metabolism l' enzyme coding gene) or adds a reaction (such as of other elements (Rodrigues and Wagner 2011). tl might occur through horizontal gene transfer). It is I note in passing that they have very siInilar proper­ I( useful to require that each step of such a random ties if one requires that each random walk preserves sl the rate at which a network can synthesize biomass, sl walk preserves the phenotype. During this random Ir, walk, one can also determine all metabolic geno­ and not just the mere ability to synthesize biomass types in the immediate neighborhood of the ran­ (Samal et al. 2010). dom walking network, determine their metabolic el phenotypes, and ask which of them are novel e; 17.5 Regulatory circuits and novel gene metabolic phenotypes that allow survival on new expression patterns ir carbon sources. This neighborhood is of special m interest, because it contains all novel metabolic phe­ The second system class I will briefly discuss here in notypes that are eaSily accessible from a network are regulatory circuits. They are systems of inter­ m through changes in a Single reaction. acting gene products that influence each other's et We have carried out such random walks from biological activity. Their phenotypes are gene Lc different starting points, with very different start­ expression phenotypes or, more generally, molec­ ing metabolic phenotypes, and explored llmova­ ular activities of gene products with important ciI tions ll1 the utilization of carbon and other elements biological functions. Such circuits are llwolved inl (Rodrigues and Wagner 2009; Samal et al. 2010). whenever cells and tissues communicate, and ex, Together, these analyses have revealed some strik­ whenever gene expression is regulated (Gilbert va­ ingly siInple prll1ciples of metabolic genotype space 1997; Carroll et al. 2001). Both processes are indis­ ule organization. pensable for the development of any multicellular (Kl First, individual metabolic genotypes typically organism, and thus for the formation of all macro­ ill have many neighbors with the same metabolic phe­ scopic phenotypes. The most important kinds of cir­ prE notype. In other words, the metabolic phenotypes cuits are transcriptional regulation circuits, because wh of these metabolic networks are to some extent transcriptional regulation provides a regulatory eXF robust to that llwolve changes ll1 individ­ backbone of all organismal life, and because such (Ca ual reactions. circuits drive many pattern formation processes in exii HIGH · DIMENS IONA L ADA PTI VE LAN DSCAP ES FA CILIT ATE EV OL UT IONARY INNO VATION 275 lotype : reach at one >or, to :t ever :k can Irks in apart Figure 17.1 Genotype networks in genotype space. Ltmay The large rectangle schematically represents a genotype space. lCtions Open circles represent genotypes with the same (hypothetical) phenotype; neighboring genotypes are connected by straight lines. The figure shows that the open circles form a large peson connected network that extends far through genotype space. 'ferent The figure also contains symbols of variou s shapes and prop­ shading. Each such symbol represents a genotype with a ltypes different new phenotype, where this genotype is adjacent to the genotype network. That is, it can be reached through a allow single, small genetic change from some genotype on this anged network. The figure illustrates that many different novel space phenotypes can be accessed from a connected genotype network that spreads far through genotype space. The usual )f car­ caveat that two·dimensional images poorly represent high·dimensional spaces applies. For example, each genotype >olism typically has hundreds to thousands of neighbors, many more 2011) . than can be shown here. Also, each genotype that is not on the roper- focal genotype network (symbols of different shape and 3erves shading) is also part of a large genotype network that is not )mass, shown here. Figure from Wagner (2011), used with permission from Oxford University Press. amass

embryonic development. Among the best known functions) are associated with the formation of examples are Hox gene circuits involved in pattern­ novel phenotypes. ing limbs and many other body structures in ani­ Because transcriptional regulation circuits are mals, as well as circuits involving MADS-box genes central to embryonic development and to regu­ important in patterning flowers (Hughes and Kauf­ latory innovations, many such circuits are well­ man 2002; Irish 2003; Wagner et al. 2003; Causier studied individually (Carroll et al. 2001; Davidson et al. 2005; Lemons and McGinnis 2006; Hueber and and Erwin 2006). However, a systematic under­ Lohmann 2008). standing of novel regulatory phenotypes requires Regulatory change in transcriptional regulation analysis of not one circuit, but a systematic anal­ circuits and other circuits is involved in form­ ysis of thousands of circuits in the genotype space ing many new' novel macroscopic phenotypes. For of such circuits. For this purpose, computational example, the formation of dissected leaves, an inno­ models of such circuits are currently indispensable vation of some plants that may aid thermoreg­ (von Dassow et al. 2000; Albert and Othmer 2003; ulation, is driven by overexpression of KNOX MacCarthy et al. 2003; Jaeger et al. 2004; Sanchez (KNOTTED1-like homeobox) transcription factors et al. 2008). in developing leaves (Bharathan et al. 2002). The Models that lend themselves to an exploration of predator-deterring eye-spots of butterflies form a circuit space represent the topology of such a cir­ where the transcription factor Distal-less is over­ cuit, that is, the pattern of activating and inhibiting expressed. In these and many other examples regulatory interactions, in a systematic way (Mac­ (Carroll et al. 2001), changes in the regulation of Carthy et al. 2003; Wagner 2005a; Ciliberti et al. existing molecules (that also serve other, ancestral 2007a, 2007b). One can think of a circuit's pattern 276 PART V DEVELOPMENT, FORM, AND FUNCTION of regulatory interactions-mediated by the DNA some animals to survive temperahtres where nor­ that encodes transcription factors and cis-regulatory mal body fluids would freeze. This ability is caused regions-as the regulatory genotype of such a cir­ by antifreeze proteins, evolutionary umovations cuit (Fig. 17.1). For a transcriptional regulation that arose rapidly, multiple times u1dependently, circuit, this genotype describes which genes acti­ and from different ancestors in arctic and Antarctic vate or repress each other's expression, and how fish (Chen et a1. 1997; Cheng 1998). Another exam­ strongly they do so. Each circuit genotype exists ple uwolves the ability of the bar-headed goose in a space of possible such genotypes. This space (Ansel' indicus) to migrate over the Himalayas at alti­ contains circuits with all possible patterns of regula­ tudes exceeding 10 kilometers (Golding and Dean tory interactions between circuit genes. Two circuits 1998; Liang et al." 2001). In this bird, one of the sub­ are neighbors in this space if they differ in exactly lmits of hemoglobu1 experienced a sU1g1e prolu1e to one regulatory interaction. The distance between alanine substihttion. This substitution increases the two circuits is the number or fraction of regula­ proteu1's affuuty to oxygen, and thus allows it to tory interactions in which they differ. Two circuits transport oxygen at lower oxygen concentrations. It are maximally different, if they have no regulatory is one of several changes that make the bar-headed interactions in common. The regulatory interactions goose one of the lughest flying birds. specified in a regulatory genotype determine the The genotype space of macromolecules is the circuit's phenotype. This phenotype reflects the activ­ space of all possible nucleotide or amino acid ity or expression level of each gene in anyone sequences (Table 17.1). Its structure has been shtd­ cell, which can be represented either continuously ied for many years (Lipman and Wilbur 1991; or discretely ("on" or "off"). This highly simpli­ Schuster et a1. 1994), and shows the same three fea­ fied discrete representation can facilitate enumera­ tures I discussed earlier for other genotype spaces. tion and comparison of different circuit phenotypes First, individual genotypes typically have many (Ciliberti et a1. 2007a, 2007b). neighbors with tl1e same phenotype (Reidys et a1. Just as in the case of metabolic systems, explo­ 1997; Sumedha et a1. 2007). For example, random ration of the genotype space of circuits shows mutagenesis studies of different proteins showed features very similar to metabolic network space that a large fraction of amino acid changes do (MacCarthy et a1. 2003; Wagner 2005a; Ciliberti not affect proteu1 fw1ction (Kleu1a and Miller 1990; et a1. 2007a, 2007b; Giurumescu et a1. 2009). First, Rennell et a1. 1991; Huang et a1. 1996). circuits typically have many neighbors in circuit Second, genotype networks exist U1 these spaces. c space with the same expression phenotype. Sec­ This has first been demonstrated for lattice proteu1 a ond, circuits with the same phenotype form vast models of proteu1 foldu1g, and later for real pro­ tJ cOlmected genotype networks (MacCarthy et a1. teu1s (Babajide et a1. 1997; Todd et a1. 1999, 2001; v 2003; Wagner 2005a; Ciliberti et a1. 2007a). Third, Bastolla et a1. 2003; Wagner 2005b). It has also been o the neighborhoods of different circuits contain very shown for secondary structure phenotypes of RNA, different novel phenotypes (Ciliberti et a1. 2007b; where genotype networks have been called neutral Giurumescu et a1. 2009). networks and are extensively characterized (Schus­ ter et a1. 1994; Schuster 2006). (I note parentheti­ cally that there are good reason not to use the term 17.6 Macromolecules neutral network in tlus context, because evolution The final system class are macromolecules-protein along a genotype network need not be neutral U1 and RNA. They form enzymes, exchange chemicals the molecular evolutionist's sense (Wagner 2008).) between cells and their environment, give struc­ Genotype networks in macromolecules typically tural support to cells, are central to locomotion and also extend far through genotype space. The differ­ transport, and perform many other essential fw1C­ ences between proteins with the same tertiary struc­ ex tions. Not surprisingly then, many adaptive phe­ ture phenotype and common ancestry, for example, so notypic changes are directly traceable to changes in can be dramatic. Such proteins may share only a in. macromolecules. One example regards the ability of few per cent of their amino acids (Goodman et a1. ti\ HIGH-D IMENSIONAL ADAPTIVE LANDSCAPES FACILITATE EVOLUTIONARY INNO VAT ION 277

1988; Rost 1997; Thornton et al. 1999; Todd et al. phenotypes, only few of which may be improve­ 1999; Copley and Bork 2000, Bastolla et al. 2003). ments over the old. Envision a population of organ­ Third, different neighborhoods of a genotype net­ isms that preserves its existing phenotype (through work harbor different novel phenotypes. This has stabilizing selection) while being exposed to muta­ first been shown for secondary strucrure pheno­ tional change. The existence of genotype networks types of RNA and more recently for proteins and means that the population can gradually change their enzymatic function phenotypes (Schuster et al. its genotype while preserving its phenotype. In 1994; Huynen et al. 1996; Sumedha et al. 2007; doing so, it can explore different regions of geno­ Ferrada and Wagner 2010). type space. The immediate neighborhood of the population will contain very different novel pheno­ types, depending on where its members are located 17.7 Genotype networks as a in genotype space. The existence of genotype net­ consequence of high dimensionality works, combined with the diversity of their neigh­ In sum, three very different classes of biological borhoods thus allows exploration of a myriad novel systems, all of them central for evolutionary llmo­ phenotypes. vation, show very similar organizations of their I note that the discretization of genotypes and genotype spaces. First, in all three system classes, phenotypes I used here serves to develop neces­ genotypes have many neighbors with the same sary concepts clearly. It also facilitates computa­ phenotype. Specifically, between 10% and more tional analysis of complex phenotypes and their than 50% of a genotype's neighbors typically have organization in genotype space. However, a small the same phenotype, depending on system class, but growing body of research hints that these con­ system size, genotype, and phenotype (Wagner cepts can be transferred to systems with continu­ 2011) . In other words, these systems are to some ous genotypes and phenotypes to Shldy how new extent robust to genetic change. Second, genotypes phenotypes arise in such systems (Wagner 2005a; with the same phenotype form vast genotype net­ Giurumescu et al. 2009; Hafner et al. 2009; Raman works that reach far through genotype space. They and Wagner 2011). typically span between 70-100% of the diame­ The genotype-phenotype relationships I dis­ ter of this space (Wagner 2011). I note that even cussed here can be viewed as functions from high­ though genotype networks are usually astronomi­ dimensional genotype spaces to high-dimensional cally large (1050 genotypes in a genotype network spaces of phenotypes. In the case of metabolic are not unusual), and even though they reach far networks, the genotypes reflect the presence or through genotypes space, anyone genotype net­ absence of metabolic reactions from a reaction work typically occupies only a vanishing fraction universe in anyone metabolic network, and the of genotype space. That these properties do not phenotypes reflect the set of carbon sources (or contradict each other results from the fact that geno­ sources of other elements) on which the metabolic type space has many dimensions and that it is network can sustain life. For regulatory circuits, vast-it has room for myriad genotype networks genotypes are the DNA sequences that encode a that are tightly interwoven (Wagner 2011). Third, circuit's regulatory interactions, and phenotypes different neighborhoods of a genotype network are activity or concentration patterns of molecules. generally contain different novel phenotypes. Fig; For macromolecules, genotypes are amino acids 17.1 shows a schematic of one such network (open and nucleotide sequences, and phenotypes cor­ circles) and some novel phenotypes in its neighbor­ respond to complex three-dimensional folds of hood (symbols of various shapes). these molecules and their biochemical ftmction The second and third fearure jointly facilitate the (Table 17.1). exploration of novel phenotypes. Specifically, they I also note that for the systems I consider here, solve a major problem that irmovation poses to liv­ there will generally be more genotypes than pheno­ ing systems: organisms need to preserve old, adap­ types. For example, for proteins that are N amino tive phenotypes while exploring irmumerable new acids long, there is an astronomical number of 20N 278 PART V DEVELOPMENT, FORM, AND FUNCTION genotypes even for moderately large N. In contrast, spond to the holes in the landscape.) But to study the number of protein tertiary structure phenotypes innovation, this will not suffice. It becomes neces­ (protein folds) is of the order 104 (Levitt 2009), sary to study phenotypes as complex, multidimen­ and for enzymes, the most prominent class of pro­ sional objects. One could say that for innovation, teins, the number of known junction phenotypes­ it is the off-ridge regions in the landscape that are the number biochemical reactions they catalyze-is most important (although they are no longer mere of the same order of magnitude (Ogata et al. 1999). holes). They represent genotypes with new pheno­ Even if these estimates were to be too low by a fac­ types, some of which may have superior fitness. tor 100 or 1000, the total number of protein pheno­ The second major difference is that to understand types would be minute compared to the number of innovation, a detailed mechanistic understanding genotypes. In regulatory circuits and metabolism, of how phenotypes emerge from genotypes is cru­ different arguments lead to the same conclusion cial. The system classes I discuss here meet this (Wagner 2011). For example, in regulatory circuits requirement. It is not needed to study speciation involving N molecules, the number of regulatory on holey landscapes, where phenotypes (fitness) are genotypes scales with the number of possible inter­ typically assigned randomly to genotypes. Finally, I actions between molecules, and thus with W, note that innovations also occur in organisms where whereas the number of possible activity states reproductive isolation, and thus also reproductive scales with the number of molecules N. There will isolation by the holey-landscape mechanism, has therefore be more regulatory genotypes than phe­ limited relevance. They include asexual eukaryotes notypes. I finally note that if the number of geno­ and prokaryotic microbes with unusual forms of types did not exceed the number of phenotypes, sex. limovation does not generally require repro­ be they protein structures, gene expression pheno­ ductive isolation. types, metabolic phenotypes, or visible macroscopic How does a map from complex multidimen­ traits of organisms, then neutral mutations would sional genotypes to a multidimensional phenotypes not exist, contrary to what empirical genetic data relate to an Adaptive Landscape? To see the con­ suggest (Eyre-Walker and Keightley 2007). nection, we need to simplify this map a bit. Con­ Any systematic analysis of innovation requires sider the example of metabolic systems, and a phenotypes that are complex, multidimensional specific metabolic phenotype P . For this pheno­ objects, and not just simple scalars, as in many type, one can define an Adaptive Landscape in population genetic models of evolution. The phe­ metabolic genotype space as a function from this notypes in all three systems I discuss here meet space into the non-negative real numbers. Each this criterion. For example, metabolic phenotypes genotype's value of this function (its height in the can be represented as vectors in many dimensions, landscape) maps onto the distance of its phenotype as can the spatial coordinates of a folded protein's from P. Peaks of the landscape correspond to geno­ amino acids. types whose phenotypes are equal to P. Genotypes The concepts I discuss here share one com­ whose phenotype have a given distance from P monality with holey Adaptive Landscapes used to occur along the same contour lines (elevation) of the study speciation (Gavrilets 1997), but they differ landscape. in more important ways. The commonality is that Analogous definitions are possible for genotype­ genetic change can occur along ridgelines in a high­ phenotype maps in regulatory systems and in dimensional space. macromolecules, because one can define analogous The first major difference regards the multidi­ distances among their phenotypes (e.g. Schuster mensional nature of the phenotypes I consider. To et al. 1994; Ciliberti et al. 2007a). Exactly as for study speciation and reproductive isolation, it is metabolic systems, the height of a point (genotype) may be adequate to consider scalar-valued fitness­ corresponds to the distance its phenotype has from or, as in holey landscapes, binary fitness of val­ some focal phenotype. The genotypes with the focal ues 0 and I-as the only aspect of phenotype. (In phenotype are the peaks in this landscape. The more this case, the genotypes with fitness zero corre- distant a genotype's phenotype is from this focal HIGH-DIMENSIONAL ADAPTIVE LANDSCAPES FACILITATE EVOLUTIONARY INNOVAT ION 279 udy phenotype, the lower its elevation in this landscape. phenotypes limit its utility. This does not mean !ces­ This simplification renders the maps I consider that we should abandon the concept. There may be len­ instances of phenotype landscapes with a scalar philosophical quibbles about it (Provine 1986, dis­ ion, phenotype (see also Chapter 18 of this volume). cussed also in Chapter 2), but nothing speaks better are In Adaptive Landscapes thus defined, the exis­ to its success than its widespread usage almost a lere tence of genotype networks clearly violates our cenhtry after its conception.

~no- geometric intuition derived from low-dimensional The various incarnations the Adaptive Land­ less. spaces. For anyone phenotype, a genotype network scape has taken in the hands of different researchers and corresponds to a series of peaks that occur through­ (e.g., Chapters 2, 3, 9, 13, and 19) show that rather ling out genotype space, and that are all cOlmected to than abandoning the concept, we need to refine cru­ one another. No valleys separate these peaks. In it for specific purposes. To study llmovation, for this other words, they form an interlaced network of example, we can extend it to ftmctions on genotypes tion ridges reaching into distant corners of the space. To whose values are qualitatively different phenotypes I are confound our intuition further, this web of ridges in a high-dimensional space. In that case, mathe­ ly, I exists for many different focal phenotypes. Each of matical or computational analysis needs to replace lere them has its own genotype network, and all these our geometric llltuition. Further refulements will !tive genotype networks are tightly interwoven with one undoubtedly be necessary. For example, some phe­ has another (Schuster et al. 1994; Ciliberti et al. 2007b; notypes and even genotypes are best viewed as otes Rodrigues and Wagner 2009). objec.ts III a continuous and not in a discrete s of Fundamentally, the reason why genotype net­ space. Examples do not only lllclude models from pro- works exist is that typical genotypes with some quantitative genetics where genotypes are contlll­ phenotype P have many neighbors with the same uously valued genetic "variables" that tmderly a len- phenotype. If this was not the case, and if the geno­ uni- or multivariate continuous phenotype. They rpes types comprising a typical genotype network were also include, on the genotypic level, the regulatory otherwise randomly distributed in genotype space, genotypes of many signallllg circuits, which are these genotypes would be isolated from one anotller defuled tlu·ough parameters such as (contiImous) (Ciliberti et al. 2007a; Wagner 2011). In other words, association and dissociation constants, and reaction their to mutation brings forth the vast rates. genotype networks of which they are a part. On the phenotypic level, they include the ever­ The fact that a genotype can have many neigh­ changing conformations-defuled by the continu­ bors at all emerges from the high-dimensionality ous atomic coordinates of amino acids-tllat one of genotype space. In a reaction universe of 5000 protein can adopt through thermal noise, or the reactions, each metabolic network has 5000 neigh­ effectively continuous changes of concentrations bors. In a transcriptional regulation circuit of 20 that gene products can tmdergo. Where such con­ genes, there are of the order of 400 possible reg­ tinuous systems have been shtdied, phenomena ulatory interactions; each circuit thus has of the analogous to those in discrete systems seem to order of 400 neighbors. In a protein of 100 amino exist (Wagner 2005a; Giurumescu et al. 2009; Hafner acids and 20 possible amino acids, each genotype et al. 2009). However, we lack the theoretical has 100 x 19 = 1900 numbers of neighbors. Because fOtmdation to study the concepts I emphasized large numbers of neighbors are possible only in a here rigorously in such systems. For example, high-dimensional space, so is the possibility to have even just defining the analogue to a genotype many neighbors with the same phenotype, and thus network III a contllmous, high-dimensional geno­ the existence of vast genotype networks. type space presents challenges. If we meet these challenges, we may discover new worlds III the vast universe of genotype space. There is lit­ 17.8 Outlook tle doubt that Adaptive Landscapes, despite their Applying the Adaptive Landscape concept to low­ limitations, will help us in understanding this dimensional genotype spaces and to simple, scalar universe. 280 PART V DEVELOPMENT, FORM, AND FUNCTION

Acknowledgments regulatory gene networks. PLoS Computational Biol­ ogy, 3(2), e15. I thank Charles Goodnight, the editors, and an Ciliberti, S., Martin, O.e., and Wagner, A (2007b). Irmo­ anonymous reviewer for suggestions that helped vation and robush1ess in complex regulatory gene net­ me improve this manuscript. I also acknowledge works. Proceedings of the National Academy of Sci­ support through Swiss National Science Foun­ ences of the United States of America, 104, 13591-13596. dation grants 315200-116814, 315200-119697, and Cline, RE., Hill, RH., Phillips, D.L., and Needl1am, L.L. 315230-129708, as well as through the YeastX (1989). Pentachlorophenol measurements in body-fluids project of SystemsX.ch, and the University Priority of people in log homes and workplaces. Archives Research Program in Systems Biology at the Univer­ of Environmental Contamination and Toxicology, 18, 475-481. sity of Zurich. Conrad, M. (1990). The geometry of evolution. Biosys­ terns, 24, 61-81. Copley, S.D. (2000). Evolution of a metabolic pathway References for degradation of a toxic xenobiotic: the patchwork Albert, R and Othmer, H.G. (2003). The topology of the approach. Trends in Biochemical Sciences, 25, 261-265. regulatory interactions predicts the expression pattern Copley, RR and Bork, P. (2000). Homology among (ba)s of the segment polarity genes in Drosophila 1'Ilelal1ogaster. barrels: Implications for the evolution of metabolic Journal of Theoretical Biology, 223,1-18. pathways. Journal of Molecular Biology, 303, 627-640. Ancel, L.W. and Fontana, W. (2000). Plasticity, evolvabil­ Dantas, G., Sommer, M.O.A., Oluwasegun, RD., and ity, and modularity in RNA Journal of Experimental Church, G.M. (2008). Bacteria subsisting on antibiotics. Zoology /Molecular Development and Evolution, 288, Science, 320, 100-103. 242-283. Davidson, E.H. and Erwin, D.H. (2006). Gene regulatory Babajide, A, Hofacker, 1., SippI, M., and Stadler, P. (1997). networks and the evolution of animal body plans. Sci­ Neutral networks in protein space: a computational ence,311,796-800. study based on knowledge-based potentials of mean Eyre-Walker, A and Keightley, PD. (2007). The distribu­ force. Folding & Design, 2, 261-269. tion of fitness effects of new mutations. Nature Reviews Bastolla, u., Porto, M., Roman, H.E., and Vendruscolo, M. Genetics, 8, 610-618. (2003). Connectivity of neutral networks, overdisper­ Feist, AM., Henry, e.S., Reed, J.L., Krummenacker, M., sion, and structural conservation in protein evolution. Joyce, A R, Karp, P. D., et al. (2007). A genome­ Journal of Molecular Evolution, 56, 243-254. scale metabolic reconstruction for Escherichia coli K-12 Bharathan, G., Goliber, T.E., Moore, e., Kessler, S., Pham, MG1655 that accounts for 1260 ORFs and thermody­ T., and Sinha, N.R (2002). Homologies in leaf form namic information. Molecular Systems Biology, 3, 12l. inferred from KNOX! gene expression during develop­ Feist, AM., Herrgard, MI, Thiele, 1., Reed, J.L., and ment. Science, 296,1858-1860. Palsson, B.O. (2009). Reconstruction of biochemical net­ Carroll, S.B., Grenier, J.K., and Weatherbee, S.D. (2001). works in microorganisms. Nature Reviews Microbiol­ From DNA to diverSity. Molecular genetics and the evo­ ogy, 7, 129-143. lution of animal design. Blackwell, Malden, MA. Feist, AM. and Palsson, B.O. (2008). The growing scope of Causier, B., Castillo, R, Zhou, J.L., Ingram, R, Xue, Y, applications of genome-scale metabolic reconstructions Schwarz-Sommer, Z., et al. (2005). Evolution in action: using Escherichia coli. Nahlre Bioteclmology, 26, 659-667. Following function in duplicated floral homeotic genes. Ferrada, E. and Wagner, A. (2010). Evolutionary inno­ Current Biology, 15, 1508-1512. vation and the organization of protein functions in Chen, L.B., DeVries, AL., and Cheng, e.H.e. (1997) . Con­ sequence space. PLoS ONE, 5(11), e14172. vergent evolution of antifreeze glycoproteins in Antarc­ Forster, J., Famili, 1., Fu, P., Palsson, B., and Nielsen, J. tic notothenioid fish and Arctic cod. Proceedings of the (2003). Genome-scale reconstruction of the Saccha­ National Academy of Sciences of the United States of romyces cerevisiae metabolic network. Genome America, 94, 3817-3822. Research,13,244-253. Cheng, e.e.-H. (1998). Evolution of the diverse antifreeze Futuyma, D.J. (1998). Evolutionary Biology. Sinauer, Sun­ proteins. Current Opinion in Genetics & Development, derland, MA. 8,715-720. Gavrilets, S. (1997). Evolution and speciation on holey Ciliberti, S., Martin, O.e., and Wagner, A (2007a). Circuit adaptive landscapes. Trends in Ecology & Evolution, 12, topology and the evolution of robustness in complex 307-312. HIGH-DIMENSIONAL ADAPTIVE LANDSCAPES FACILITATE EVOLUTIONARY INNOVATION 281

Gavrilets, S. (2005). Fih1ess landscapes and the origu1 of Liang, Y.H., Hua, Z.Q., Liang, X., Xu, Q., and Lu, species. Princeton University Press, Princeton, NJ. G.Y. (2001). The crystal structure of bar-headed goose Gilbert, S.F. (1997). Developmental Biology. Sinauer, Slm­ hemoglobin in deoxy form: The allosteric mechanism of derland, MA. a hemoglobin species with high oxygen affinity. Journal Giurumescu, CA, Sternberg, P.W., and Asthagiri, AR. of Molecular Biology, 313, 123-137. (2009) . Predicting phenotypic diversity and the under­ Lipman, D. and Wilbur, W. (1991). Modeling neutral and lying quantitative molecular transitions. PLoS Compu­ selective evolution of protein folding. Proceedings of tational Biology,S, e1000354. the Royal Society of London Series B, 245, 7-11. Golding, G.B. and Dean, AM. (1998). The structural basis MacCarthy, T., Seymour, R., and Pomiankowski, A (2003). of molecular . Molecular Biology and Evolu­ The evolutionary potential of the Drosophila sex deter­ tion, 15,355-369. mination gene network. Journal of Theoretical Biology Goodman, M., Pedwaydon, J., Czelusniak, J., Suzuki, T., 225,461-468. Gotoh, T., Moens, L., et al. (1988). An evolutionary tree Ogata, H ., Goto, S., Sato, K, Fujibuchi, w., Bono, H., for invertebrate globin sequences. Journal of Molecular and Kanehisa, M. (1999). KEGG: Kyoto Encyclopedia Evolution, 27, 236-249. of Genes and Genomes. Nucleic Acids Research, 27, Hafner, M., Koeppl, H., Hasler, M., and Wagner, A 29-34. (2009). "Glocal" robustness U1 model discrintination for Provine, W.B. (1986). Sewall Wright and evolutionary biol­ circadian oscillators. PLoS Computational Biology,S, ogy. University of Chicago Press, Chicago, IL. e1000534. Raman, K and Wagner, A. (2011). and robust­ Huang, W., Petrosino, J., Hirsch, M., Shenkin, P., and ness in a complex signaling circuit Molecular Biosys­ Palzkill, T. (1996). Amino acid sequence determll1ants of tems, 7, 1081-1092. beta-Iactamase structure and activity. Journal of Molec­ Rehmann, L. and Daugulis, AJ. (2008). Enhancement of ular Biology, 258, 688-703. PCB degradation by Burkholderia xenovorans LB400 in Hueber, S.D. and Lohmann, 1. (2008). Shaping segments: biphasic systems by manipulating culture conditions. Hox gene function in the genomic age. Bioessays, 30, Bioteclmology and Bioengineering, 99, 521-528. 965- 979. Reidys, C , Stadler, P., and Schuster, P. (1997). Generic Hughes, CL. and Kaufman, T.C (2002) . Hox genes and properties of combinatory maps: Neutral networks of the evolution of the arthropod body plan. Evolution & RNA secondary structures. Bulletin of Mathematical Development, 4, 459-499. Biology, 59, 339-397. Huynen, M., Stadler, P., and Fontana, W. (1996). Smooth­ Rennell, D., Bouvier, S., Hardy, L., and Poteete, A (1991). ness within ruggedness: The role of neutrality in adap­ Systematic mutation of bacteriophage T4 lysozyme. tation. Proceedings of the National Academy of Sciences Journal of Molecular Biology, 222, 67-87. of the United States of America, 93, 397-40l. Rodrigues, J.F. and Wagner, A (2009). Evolutionary plas­ Irish, v.F. (2003). The evolution of floral homeotic gene ticity and innovations in complex metabolic reaction nmction. Bioessays, 25, 637-646. networks. PLoS Computational Biology,S, e1000613. Jaeger, J., Surkova, S., Blagov, M., Janssens, H., Kosman, Rodrigues, I.F. and Wagner, A (2011). Genotype networks D., Kozlov, KN., et al. (2004) . Dynamic control of umovation, and robustness in sulfur metabolism. BMC positional information in the early Drosophila embryo. Systems Biology,S, 39. Nature, 430, 368-37l. Rost, B. (1997). Protein structures sustain evolutionary Kauffman, S. and Levin, S. (1987). Towards a general­ drift. Folding & Design, 2, S19-S24 theory of adaptive walks on rugged landscapes. Journal Sama!, A, Rodrigues, J.F.M., Jost, J., Martin, O.C, and of Theoretical Biology, 128, 11-45. Wagner, A. (2010). Genotype networks in metabolic Kleina, L. and Miller, J. (1990). Genetic studies of the reaction spaces. BMC Systems Biology, 4, 30. lac repressor. 13. Extensive amino-acid replacements Sanchez, L., Chaouiya, C, and Thieffry, D. (2008). generated by the use of natural and synthetic non­ Segmenting the fly embryo: logical analysis of the sense suppressors. Journal of Molecular Biology, 212, role of the Segment Polarity cross-regulatory module. 295-318. International Journal of Developmental Biology, 52, Lemons, D. and McGinnis, W. (2006). Genomic evolution 1059-1075. of Hox gene clusters. Science, 313, 1918-1922. Schillu1g, CH., Edwards, I.S., and Palsson, B.O. (1999). Levitt, M. (2009). Nature of the protein universe. Proceed­ Toward metabolic phenomics: Analysis of genomic ings of the National Academy of Sciences of the United data using flux balances. Biotechnology Progress, 15, States of America, 106, 11079-11084. 288-295. 282 PART V DEVELOPMENT, FORM, AND FUNCTION

Schuster, P. (2006). Prediction of RNA secondary struc­ van der Meer, J.R., Werlen, C, Nishino, S.F., and Spain, tures: from theory to models and real molecules. J.C (1998). Evolution of a pathway for chlorobenzene Reports on Progress in Physics, 69, 1419-1477. metabolism leads to natural attenuation in contami­ Schuster, P., Fontana, W., Stadler, P. , and Hofacker, I. nated groundwater. Applied and Enviromnental Micro­ (1994). From sequences to shapes and back-a case­ biology, 64, 4185-4193. study in RNA secondary structures. Proceedings of the von Dassow, G., Meir, E., Munro, E., and Odell, G. (2000). Royal Society of London Series B, 255, 279-284. The segment polarity network is a robust development Searcoid, M.a. (2007). Metric spaces. Springer, London. module. Nature, 406, 188-192. Sumedha, Martin, O.C, Wagner, A (2007). New structural Wagner, A (2005a). Circuit topology and the evolution of variation in evolutionary searches of RNA neutral net­ robustness U;' two-gene circadian oscillators. Proceed­ works. Biosystems, 90, 475-485. ings of the National Academy of Sciences of the United Takiguchi, M., Matsubasa, T., Amaya, Y., and Mori, M. States of America, 102, 11775- 11780. (1989). Evolutionary aspects of urea cycle enzyme Wagner, A (2005b). Robustness and evolvability in living genes. Bioessays, 10, 163-166. systems. Princeton University Press, Princeton, NJ. Thornton, J., Orengo, C, Todd, A, and Pearl, F. (1999). Pro­ Wagnel~ A (2008). Neutralism and selectionism: A tein folds, functions and evolution. Journal of Molecular network-based reconciliation. Nature Reviews Genet­ Biology, 293, 333-342. ics, 9, 965-974. Todd, A, Orengo, C, and Thornton, J. (1999). Evolution of Wagner, A (2011). The origins of evolutionary innova­ protein function, from a structural perspective. Current tions. A theory of transfonnative change in living sys­ Opinion in Chemical Biology, 3, 548-556. tems. Oxford University Press, Oxford. Todd, A, Orengo, C, and Thornton, J. (2001). Evolution of Wagner, G.P', Amemiya, C , and Ruddle, F. (2003. Hox function in protein superfamilies, from a structural per­ cluster duplications and the opportunity for evolution­ spective. Journal of Molecular Biology, 307, 1113-1143. ary novelties. Proceedings of the National Academy van der Meer, J.R. (1995). Evolution of novel metabolic of Sciences of the United States of America, 100, pathways for the degradation of chloroaromatic com­ 14603- 14606. pounds. Presented at Beijerinck Centennial Sympo­ Wright, S. (1932. The roles of mutation, inbreeding, sium on Microbial Physiology and Gene Regulation­ crossbreeding, and selection in evolution. Proceedings Emerging Principles and Applications, The Hague, of the Sixth International Congress on Genetics, 1, Netherlands, December. 355-366.