Satisfiability and Evolution
Total Page:16
File Type:pdf, Size:1020Kb
2014 IEEE Annual Symposium on Foundations of Computer Science Satisfiability and Evolution Adi Livnat Christos Papadimitriou Aviad Rubinstein Department of Biological Sciences Computer Science Division Computer Science Division Virginia Tech. University of California at Berkeley University of California at Berkeley [email protected] [email protected] [email protected] Gregory Valiant Andrew Wan Computer Science Department Simons Institute Stanford University University of California at Berkeley [email protected] [email protected] Abstract—We show that, if truth assignments on n variables genes are able to efficiently arise in polynomial populations reproduce through recombination so that satisfaction of a par- with recombination. In the standard view of evolution, a ticular Boolean function confers a small evolutionary advan- variant of a particular gene is more likely to spread across tage, then a polynomially large population over polynomially many generations (polynomial in n and the inverse of the initial a population if it makes its own contribution to the overall satisfaction probability) will end up almost surely consisting fitness, independent of the contributions of variants of other exclusively of satisfying truth assignments. We argue that this genes. How can complex, multi-gene traits spread in a theorem sheds light on the problem of the evolution of complex population? This may seem to be especially problematic adaptations. for multi-gene traits whose contribution to fitness does not Keywords-evolution; algorithms; Boolean functions decompose into small additive components associated with each gene variant —traits with the property that even if I. INTRODUCTION one gene variant is different from the one that is needed The TCS community has a long history of applying its for the right combination, there is no benefit, or even a net perspectives and tools to better understand the processes negative benefit. Here, we provide one rigorous argument for around us; from learning to multi-agent systems, game how such complex traits can efficiently spread throughout theory and mechanism design. By and large, the efforts a population. While we consider this question in a model to understand these areas from a rigorous and algorithmic that makes considerable (but justifiable) simplifications, this perspective have been very successful, leading to both rich model makes a theoretically rigorous contribution to the fun- theories and practical contributions. Evolution is, perhaps, damental problem of how evolution can produce complexity. one of the most blatantly algorithmic processes, yet our Motivating example: Waddington’s experiment.: In computational understanding of it is still in its infancy 1953 the great experimentalist Conrad Waddington exposed (see [16] for a pioneering study), and we currently lack the pupae of a population of Drosophilia melanogaster a computational theory explaining its apparent success. to a heat shock, and noticed that in some of the adults Algorithmically, how plausible are the origins of evolution that developed, the appearance of the wings had changed and the emergence of self-replication? Is evolution sur- (they lacked a complete posterior crossvein) [17]. He then prisingly efficient or surprisingly inefficient? What are the maintained a population of flies where only those with necessary criteria for evolution-like algorithms to yield rich, altered wings were allowed to reproduce. By repeating the interesting, and diverse ecosystems? Why is recombination procedure of heat shock and selection over the generations, (i.e., sexual reproduction) more successful than asexual the percentage of flies with altered wings increased over time reproduction? Given the reshuffling of genomes that occurs to values close to one. Even more interestingly, beginning at through recombination, how can complex traits that depend generation fourteen, some flies exhibited the new trait even on many different genes arise and spread in a population? without having been treated with heat shock. In this work, we begin to tackle this last question of At first sight, this surprising phenomenon — known as why complex traits that may depend on many different genetic assimilation — recalls Lamarck’s now discredited belief that acquired traits can be inherited. However, Boolean A.L. would like to acknowledge start-up funds from Virgina Tech. C.P. functions provide a purely genetic explanation, which ex- and A.R. are supported by NSF grants CCF0964033 and CCF1408635, and tends the idea originally offered informally by Stern [15] by Templeton Foundation grant 3966. A.W. conducted part of this research while at Harvard University and IIIS, Tsinghua University and supported (see also [2], [5]): Suppose that the phenotype “altered by NSF grant CCF-0964401. wings” is a Boolean function of n genes x1,...,xn with 0272-5428/14 $31.00 © 2014 IEEE 524 DOI 10.1109/FOCS.2014.62 two alleles (variants) each, thought as {−1, 1} variables, Populations of truth assignments and of another {−1, 1} variable h (standing for “high This way of looking at Waddington’s experiment brings temperature”). 1 about a very natural question: Is this amplification of satis- (1 + h) fying truth assignments (outcomes (c) and (d) of experiment x1 + x2 + ···+ xn + · k ≥ n, 2 described above) a property of threshold functions, or is it for some integer k (think of n ≈ 10 , and k ≈ n/3). more general? Does it hold for all monotone functions, for To see how the percentage of flies with altered wings example? For all Boolean functions? {− }n → increases in the population over time, we track the allele Consider any satisfiable Boolean function f : 1, 1 t {0, 1} of n binary genes (in the absence of the environmental frequencies from generation to generation. Let μi be the average value of xi in the population at time t, and assume variable h which was crucial in Waddington’s phenomenon). the genotype frequencies at time t are distributed according What if genotypes satisfying this Boolean function had a to a distribution μt (the reason for denoting the distribution slight advantage under natural selection? (In Waddington’s this way will become clear). If mating occurs at random experiment, they had an absolute advantage because of the with free recombination2 then, in expectation, the average experimental design.) For example, imagine that genotypes value of each xi in the next generation is given by: satisfying f survive to adulthood more than the others, in expectation, by a factor of (1 + ), for some small E t · t+1 μ [f(x) xi] >0. Would this trait (that is, satisfaction of the Boolean μi = , (1) Eμt [f] function f) be eventually fixed in the population? And, if so, where f(x)=1exactly when a fly having genotype could this be a subtle mechanism for introducing complex x will develop altered wings (i.e., the above inequality adaptation in a population? is satisfied) and f(x)=0otherwise. We then assume To reflect our assumption that satisfaction of f confers {− }n → that the next generation will be distributed according to a only an -advantage, we may take a function f : 1, 1 t+1 { } product distribution μ , where each xi has expectation 1, 1+ , where we regard the value 1+ as “satisfied”, t+1 and the value 1 as “unsatisfied”. We track the allele fre- μi . By approximating the genotype frequencies of the quencies from generation to generation as in Waddington’s population for each generation in this way, it can be shown t+1 by calculation that a trait with this genotypic specification experiment: Equation (1) gives us the average value μi of each xi in the next generation, and we describe the next (a) is very rare in the population under normal temperature t+1 h =1; (b) it becomes much more common under high generation by the product distribution μ . temperature h =1; (c) jumps to just above 50% after the Suppose that we continue this process, starting from distribution μ0, and defining first breeding under h =1; (d) after successive breedings 1 2 t {μi }, {μi },...,{μi} ... as above. Consider the average with h =1it is nearly fixed; and (e) if after this h becomes t − fitness of the population at time t, defined as μ (f)= 1, the trait is still quite common. t Note: Our interpretation of Waddington’s experiment is Prμt [f(x)=1+]. The question is, when does μ (f) a simplification. First, we consider only the distribution of approach one? Our first result states that, for monotone n genotypes in each generation to determine the distribution functions, it does after O μ0(f) steps: of the next; instead, we could first take a finite sample t ≥ − n(1+) according to the present distribution and use that sample to Theorem 1. If f is monotone, then μ (f) 1 tμ0(f) . calculate the distribution of the next generation. Such an ap- Note: This nontrivial result also serves to illustrate proximation can only become exact when the population size one point: The work is not about satisfiability heuristics is infinite, but it is a standard and useful one in population (monotone functions are not an impressive benchmark in genetics (and we shall eventually consider finite populations this regard...). Heuristics are about finding good individuals for our main result). We also assumed that each individual in a population. In contrast, evolution is about creating good of the new generation is produced by sampling each gene populations. This is our focus here. independently of the other genes, and with probability equal to the frequencies of the two alleles of this gene in the Our ambition is to prove the same result for all Boolean parent population (the adults of the previous generation with functions. Immediately we see that this is impossible if altered wings). This assumption turns out to be justified in we insist on an infinite population: Consider the function the settings that we will consider, as will be discussed in the f = x1 ⊕ x2: starting with the uniform distribution at time following section.