Chapter 10 Toward a Theory of Multilevel Evolution: Long-Term Information Integration Shapes the Mutational Landscape and Enhances Evolvability

1 Chapter 10 Toward a Theory of Multilevel Evolution: Long-Term Information Integration Shapes the Mutational Landscape and Enhances Evolvability Paulien Hogeweg Abstract Most of evolutionary theory has abstracted away from how information is coded in the genome and how this information is transformed into traits on which selection takes place. While in the earliest stages of biological evolution, in the RNA world, the mapping from the genotype into function was largely predefined by the physical–chemical properties of the evolving entities (RNA replicators, e.g. from sequence to folded structure and catalytic sites), in present-day organisms, the mapping itself is the result of evolution. I will review results of several in silico evolutionary studies which examine the consequences of evolving the genetic coding, and the ways this information is transformed, while adapting to prevailing environments. Such multilevel evolution leads to long-term information integration. Through genome, network, and dynamical structuring, the occurrence and/or effect of random mutations becomes nonrandom, and facilitates rapid adaptation. This is what does happen in the in silico experiments. Is it also what did happen in biological evolution? I will discuss some data that suggest that it did. In any case, these results provide us with novel search images to tackle the wealth of biological data. 1 Introduction Much of current research in biology is on the physical and biochemical basis of information processing in cells. This information processing leads to the transformation of the inherited genotypic information to a living organism enough adapted to its environment to survive. P. Hogeweg () Theoretical Biology and Bioinformatics Group, Utrecht University, Padualaan 8, 3584CH Utrecht, The Netherlands e-mail: [email protected] O.S. Soyer (ed.), Evolutionary Systems Biology, Advances in Experimental 195 Medicine and Biology 751, DOI 10.1007/978-1-4614-3567-9 10, © Springer Science+Business Media, LLC 2012 196 P. Hogeweg Most of these processes were unknown to Darwin, when he formulated the theory of evolution by natural selection. Since Darwin’s time, and the development of population genetics, the major paradigm of evolutionary biology has been to largely ignore, or at least drastically simplify, the way information is coded and transformed. Transporting the “small phenotypic variations” envisioned by Darwin, to allele frequencies and nucleotide substitutions, a direct connection between the level of mutations and the level of observation was largely maintained. Because of, or despite of, this simplification, evolutionary theory could remain the cornerstone of biological thinking through all the changes in understanding the underlying processes in biological systems. Recent advances in high-throughput techniques are producing a wealth of data on the structure of genomes, regulatory networks, protein interaction networks, all types of posttranscriptional and posttranslation modifications, etc., which all together determine the genotype to phenotype mapping. On the basis of this wealth of data, systems biology tries to understand the working of present-day organisms, using a combination of data analysis, mathematical/computational modeling, and experiments. Combining systems biology and evolutionary theory is fruitful in at least three different ways. In the first place for analyzing the high-throughput data and understanding the functioning of current life-forms, an evolutionary perspective provides very powerful tools. For example, phylogenetic profiling of genes can be used to predict the functioning of the genes in the same process/pathway when they are (repeatedly) lost in the same lineages [33]. Also, multilevel evolutionary modeling can help to zoom in to the relevant parameter values governing regulatory interactions [62]. Secondly, the high-throughput data have shed exciting new light on what did happen in long-term evolution and what does happen in short-term evolution. For example phylogenetic reconstruction of fully sequenced genomes have highlighted the unexpected importance of gene loss in adaptive evolution (e.g., [11, 23, 28]), and short-term evolutionary experiments have shown the frequent occurrence of large-scale mutations like gross chromosomal rearrangements (GCRs) [15], and massive changes in transcription in very short-term adaptation [16]. In this chapter, we explore a third meaning of the term evolutionary systems biology, namely, how insights obtained by systems biology can enrich the theory of evolution itself. In particular, we want to investigate the effects of complex, multilevel genotype–phenotype mapping, and its evolution, on evolutionary dynamics. We seek “generic patterns,” i.e., we seek a baseline for what we should expect given our current knowledge or, to use the words of Koonin [39], universal laws governing evolving systems. Koonin looks for such “universal laws” by examining the data. We look for such generic patterns by studying models with many degrees of freedom and observing, against the background of the implemented mutation selection procedure, the emerging evolutionary patterns. We use nonsupervised modeling (or nongoal-directed modeling) [24, 26]. This concept can best be explained by analogy with nonsupervised pattern analysis (or nonsupervised learning), as opposed to supervised pattern analysis. In nonsupervised pattern analysis (e.g., cluster analysis), a description is given, and patterns that are not predefined are sought, whereas in supervised pattern analysis, 10 Multilevel Evolution 197 a pattern (e.g., a classification) is given, and a description is sought which allows the recognition of the classes. Likewise in nonsupervised modeling, the model does not try to find an explanation for predefined phenomena, but instead structured objects, possible transformation and interactions are defined, and the emerging patterns are studied, focusing on those patterns which are not implemented or represented in the model directly. Accordingly, in nonsupervised evolutionary modeling, we are not interested in fitness attained, but in the structural side effects of attaining fitness. The advantage of such an approach is that we can find, like in the pattern analysis counterpart, truly unexpected patterns. Moreover, apparently unrelated phenomena may appear as the side effects of the same basic processes. Another advantage is that we can retain some of the complexity which is the hallmark of biological systems, e.g., large genomes, and the complexity of the mapping of genome into the phenotype. In formulating these models, we adhere to the well-known dictum “models should be as simple as possible, but not more so”.1 We think that abstracting from the multilevel nature of biological systems constitutes a too drastic simplification. Instead, we study the consequences of the multilevel nature in models which are as simple as possible. An apparent disadvantage is that we can only study particular examples. That is in fact what Darwin did and what biologist still do in studying a limited number of model organisms. I will argue that by studying well-chosen examples, we can attain more generality than by molding our models into too much generality beforehand. In line with this methodology, I will review in this chapter a number of specific models we studied recently and later point out more general patterns in the results. I will first review the by now classical results of the shape of fitness landscapes of high-dimensional genotype spaces and a complex structural mapping of genotype to fitness, as gleaned from studying RNA landscapes. Next, I will use a more flexible genotype representation, adding successive layers in the mapping from the genome to the structure and/or dynamics which determines fitness. We show that the properties of the fixed landscapes still hold but are significantly enriched in this more open-ended setting. Moreover, new patterns arise, which indicate that surprising features gleaned from phylogenetic studies may be generic patterns of multilevel evolution. Finally, adding an ecological level, I probe how new levels of selection emerge and how these levels of selection may feedback on the genome, generating a more complex genomic organization. Together, these examples start to outline the contours of a theory of multilevel evolution and suggest that the multilevel nature of biological systems allows for long-term information integration. A striking consequence of this long-term information integration is that mutation and selection are no longer independent: the 1This dictum is often attributed to Einstein (e.g., [42]), although he has never said it in this form. Nevertheless, it remains a nice pointer to emphasize that on the one hand, models should not incorporate unnecessary detail, but on the other hand should not overlook (and therewith obscure) essential features of the process modeled. 198 P. Hogeweg types of mutations which can/will happen in evolved systems, as well as their effect, are shaped by past selection. In other words, “random mutations are not random” in evolved systems. 2 High Dimensional Genotype Space with Nonlinear, Redundant Mapping from Genotype to Phenotype A hallmark of biological systems is the very large genotype space. An often used visualization of evolutionary processes makes use of the concept fitness landscape, first introduced by Sewell Wright [71]. However, our intuition about landscapes in general and fitness landscapes in particular is strongly biased to lower

Chapter 10 Toward a Theory of Multilevel Evolution: Long-Term Information Integration Shapes the Mutational Landscape and Enhances Evolvability

"Evolutionary Emergence of Genes Through Retrotransposition"

Experimental Evolution

Genome Organization/ Human

Estimation of Duplication History Under a Stochastic Model for Tandem Repeats Farzad Farnoud1* , Moshe Schwartz2 and Jehoshua Bruck3

Genomic Comparison of Closely Related Giant Viruses Supports an Accordion-Like Model of Evolution

Long-Term Experimental Evolution in Escherichia Coli. V. Effects of Recombination with Immigrant Genotypes on the Rate of Bacterial Evolution

Extrachromosomal Element Capture and the Evolution of Multiple Replication Origins in Archaeal Chromosomes

Selection Experiments and Experimental Evolution of Performance and Physiology

Mitochondrial DNA Duplication, Recombination, and Introgression During Interspecific Hybridization

Darwinian Evolution in a Translation-Coupled RNA Replication System Within a Cell-Like Compartment

CAGGG Repeats and the Pericentromeric Duplication of the Hominoid Genome

Rider Transposon Insertion and Phenotypic Change in Tomato