bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Title: Is predictable? under complex - maps

*1 *1,2,3 2 Authors: Lisandro Milocco , Isaac Salazar-Ciudad

3 Affiliations:

4 1Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland

5 2Centre de Recerca Matemàtica, 08193 Barcelona, Spain

6 3Genomics, Bioinformatics and Evolution. Departament de Genètica i Microbiologia, Universitat

7 Autònoma de Barcelona, 08193 Barcelona, Spain

8 *Correspondence to:

9 Isaac Salazar-Ciudad. Email: [email protected] Phone: +358 4148 26384

10 Lisandro Milocco, [email protected]. Phone: +358 4148 26384

11

1 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Abstract: A fundamental aim of post-genomic 21st century biology is to understand the

2 genotype-phenotype map (GPM) or how specific genetic variation relates to specific phenotypic

3 variation (1). Quantitative genetics approximates such maps using linear models, and has

4 developed methods to predict the response to selection in a population (2, 3). The other major

5 field of research concerned with the GPM, developmental or evo-devo (1,

6 4–6), has found the GPM to be highly nonlinear and complex (4, 7). Here we quantify how the

7 predictions of quantitative genetics are affected by the complex, nonlinear maps found in

8 developmental biology. We combine a realistic development-based GPM model and a population

9 genetics model of recombination, and . Each individual in the

10 population consists of a genotype and a multi-trait phenotype that arises through the

11 development model. We simulate evolution by applying natural selection on multiple traits per

12 individual. In addition, we estimate the quantitative genetics parameters required to predict the

13 response to selection. We found that the disagreements between predicted and observed

14 responses to selection are common, roughly in a third of generations, and are highly dependent

15 on the traits being selected. These disagreements are systematic and related to the nonlinear

16 nature of the genotype-phenotype map. Our results are a step towards integrating the fields

17 studying the GPM.

18

19 Introduction

20 A fundamental aim of post-genomic 21st century biology is to understand how genomic variation

21 relates to specific phenotypic variation. This relationship, called the genotype-phenotype map or

22 GPM, is considered by many researchers to be critical factor for understanding phenotypic

23 evolution (1). The GPM determines which phenotypic variation arises from which random

2 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 genetic variation. Natural selection then acts on that realized phenotypic variation. Thus, natural

2 selection and the GPM jointly determine how traits change over evolutionary time.

3 Quantitative genetics uses a statistical approach to describe the GPM and predict

4 phenotypic change in evolution by natural selection. This approach has long made significant

5 contributions to plant and animal breeding (2,3). According to the breeder’s equation of

6 quantitative genetics (8, 9), the response to selection in a set of traits ( ), is the product of the

7 matrix of additive genetic variances and covariances between traits (the G-matrix) and the

8 selection gradient ( ), i.e. the direct strength of selection acting on each trait.

9

10 This is the canonical equation for inferring past selection and predicting future responses to

11 selection (10, 11). The G-matrix has been interpreted as a measure of genetic constraints on

12 future evolution (12), the diagonal elements of G measure the short-term readiness of a trait to

13 respond to selection while the off-diagonal elements measure how a trait evolution is slowed

14 down (or accelerated) by the evolution in other traits.

15 Developmental evolutionary biology or evo-devo (1, 4-6) is the other main field

16 concerned with the GPM. Evo-devo views the GPM as highly nonlinear and complex (4, 7).

17 Most evo-devo studies, however, do not consider the population level. At that level, it has been

18 suggested that the nonlinearities of the GPM will average out, and hence would not affect the

19 accuracy of quantitative genetics (13), at least in the short-term (14).

20 Our aim is to study how the predictions of the multivariate breeder’s equation are

21 affected when considering the complex and nonlinear GPMs found in the study of development.

22 For that purpose, we combine a computational GPM model that is based on current

23 understanding of developmental biology (15, 16) and a population genetics model with mutation,

3 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 recombination, drift and natural selection. The developmental model has been shown to be able

2 to reproduce multivariate morphological variation at the population level (15). It includes a

3 network of gene regulatory interactions and cell and tissue biomechanical interactions. The

4 model’s parameters specify how strong or weak those interactions are. The value of each

5 individual’s parameter are determined additively by many loci. The developmental model

6 produces then, for each individual, a 3D morphological phenotype (see Fig. S1) on which a

7 number of traits are measured: the position of specific morphological landmarks. Individual

8 is calculated as the distance between each of five traits in each individual and these same

9 traits in each simulation’s optimal morphology (see Supplementary Figs. 1, 2). The processes of

10 mutation, development and selection are iterated over generations to simulate trait evolution. At

11 the same time, we estimate G, P and s in each generation and use the multivariate breeder’s

12 equation to estimate the expected response to selection per generation. We then quantify the

13 difference between expected and observed trait changes in the simulations with a complex

14 development-based GPM (see Fig. S3). The difference we estimate should be regarded as the

15 minimal theoretically possible since we include no environmental noise (i.e. no environmental

16 variance) and G, P and s are estimated in each generation.

17 Results

18 We found that in many generations there is a significant discrepancy, or prediction error,

19 between the trait changes observed in the simulations and the trait changes predicted from the

20 multivariate breeder’s equations (see Fig. 1, Supplementary Video 1, 2, for two example

21 simulations). To study whether this prediction error is merely due to stochastic processes such as

22 drift, mutation or recombination, we re-simulated 40 times each generation of every simulation.

23 Since mutation, recombination and genetic drift are simulated as stochastic processes, different

4 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 trait changes are observed in each of these 40 simulations (see the 99% confidence intervals

2 shown in a gray shade in Fig. 1). If the multivariate breeder’s equation is accurate, the expected

3 trait changes should not be statistically different from the the mean of these 40 observed changes.

4 However, as can be seen in the simulations in Fig. 1, the deviation is significant for a large

5 number of generations. There is, thus, a systematic prediction error that is not explainable from

6 stochastic processes. We refer to this systematic error as bias. For the simulation shown in Fig.

7 1A, the bias is large and significant between generations 15 and 55.

8 Bias was not exclusive to a single trait or simulation (see Fig. S4). Bias was common.

9 Fig. 2 shows the percentage of generations across all simulations that showed significant bias

10 with 99% confidence. For individual simulations the median was that 30% of the generations

11 showed significant bias in traits 2 and 4 (the x-position of each lateral cusps) and less so for the

12 three traits. Bias can be very large, see Fig. 2B. Considering all traits together, the prediction

13 error divided by the observed change per generation is at least 46% for half of the generations

14 exhibiting significant bias.

15 We found that in our simulations bias arises from nonlinearities in the GPM. Fig. 3 and

16 Fig. S5 show explanatory examples of how specific aspects of the developmental dynamics lead

17 to nonlinearities and then to bias. Bias correlates with measures of the linearity of the GPM

18 around the region of the parameter space where the population is distributed at a given

19 generation (see Fig. S6). These nonlinear regions are quite common and, thus, most populations

20 would pass through one such regions as they evolve.

21 The bias we encountered is not specific of our development model. We have ran also a

22 model not based on development, in which traits were simple mathematical functions of some

23 arbitrary model parameters (see Supplementary Information). When a linear map was used, no

5 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 prediction bias was found. In contrast, nonlinear GPMs lead to bias as with our development

2 model (Fig. S7).

3 Population parameters such as population size, mutation rate, selection strength and

4 number of loci had a modest effect on bias (Fig. 4). They affect how fast the population moves in

5 the trait and developmental parameter space and then, indirectly, the likelihood of encountering a

6 nonlinear region of the parameter space leading to bias (see Fig. 4 and Supplementary

7 Information).

8

9 Discussion

10 Bias can be understood better by considering how additive genetic variances are

11 estimated. Although there are several ways to do so (see Materials and Methods), they all assume

12 a linear relationship between traits and the genetic relatedness between individuals. This

13 assumption does not always hold in our development-based model, which means that the linear

14 transformation used in breeder's equation can significantly over- or underestimate the change in

15 the mean of traits. The examples in Fig. 3 and Fig. S8 show, for three generations with bias, the

16 parent-offspring regression for a trait, and the region of the developmental parameter occupied

17 by the population. In this region, small genetic changes lead to small changes in the

18 developmental parameters but these lead to relatively large changes in some traits. As a result,

19 when the population crosses this region of the developmental parameter space on its way to the

20 optimum, the change in some of the traits is different to that predicted from the linear

21 approximation of the multivariate breeder’s equation.

22 Even when variation no variation is observed beyond some critical trait value bias can

23 still occur (Figure 3E). Some combinations of traits values, i.e. some morphologies, can not arise

6 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 in the development model, at least not without dramatic changes in the developmental

2 parameters. As a result, it can sometimes occur that a trait can not evolve beyond the critical trait

3 value even if there is a selective pressure for it. In this situation bias arises because, even if there

4 is no variation above or below the critical trait value, there is trait variation before the population

5 reaches this value. Because this variation, the parent-offspring regressions for that trait (and

6 between this trait and the others) are non-zero; see Figure S9. Based on these regressions the

7 multivariate breeder’s equation estimates that variation beyond this critical trait value should

8 exist, and thus a possible response to selection, when in fact it does not (see Figure S9). This

9 prediction bias can persist over time because even if the population is pushed by selection close

10 to the critical trait value, mutation and recombination spread it over a range of values before the

11 threshold and, thus, a non-zero parent-offspring regression persists (see Figure S9B).

12 There is already some theoretical literature stating that the assumptions of the breeders

13 equations may not always hold. Most previous studies, however, do not directly consider the

14 GPM (17, 18). An exception is Rice’s formalization of adaptation under known and nonlinear

15 GPMs (19). This previous work, however, does not explain how nonlinear GPMs may actually

16 be. Our study quantifies to which extent the predictions of breeder’s equations hold for realistic

17 development-based GPMs as we currently understand them. Compared to theoretical

18 evolutionary genetics models (20, 21), our model does not assume a specific pattern of

19 among loci, instead, such patterns arise from the dynamics of the development model and are not

20 compatible with these assumed patterns (see Fig. S10 and Supplementary Information).

21 Although our development model is based on tooth morphogenesis, there is no reason to

22 expect that the complexity of the GPM should be very different for other organs of a similar

23 morphological complexity (22, 23). In fact, in the supplementary information we explain how

7 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 even arbitrary GPMs, as long as they are nonlinear, would produce similar biases (see Fig. S7).

2 In addition, most, if not all, mathematical models of development or gene network dynamics are

3 highly nonlinear and lead to complex GPMs similar to those in our model (23, 24).

4 The prevailing view in quantitative genetics is that breeder’s equations is very accurate

5 when applied to single traits (3). For selection for multiple traits at the same time or when

6 selecting for some traits and studying the correlated response in other traits, however, the scarce

7 empirical evidence does not always fit the expectations from breeder's equation (3, 25-27). Our

8 results suggest a simple reason for these inaccuracies and provide a theoretical nonlinear-GPM

9 perspective on what to expect when multiple trait selection experiments finally become more

10 common.

11 There is a long ongoing apparent discrepancy between evolutionary developmental

12 biology and quantitative genetics (7, 9, 12, 14). One approach views the GPM as simple enough,

13 at least in practice, for trait evolution to be predictable from linear statistical approaches. The

14 other views the GPM as highly nonlinear. Our results present a potential point of connection

15 between these two views. The predictions of quantitative genetics would often work accurately

16 but, very often too, they would show relatively large prediction errors that can not be corrected

17 by better estimating quantitative genetic parameters since they are due to the way the

18 nonlinearity of the GPM.

19

20 Materials and Methods

21 In this work, evolution is modeled using an individual-based algorithm similar to that in (16). 22 The complete model is composed of a developmental model and a population model. The 23 developmental model is used to generate each individual’s phenotype from its genotype. The 24 population model is used to decide the in each generation based on genotypes from 25 the previous generation through selection, mutation, drift and recombination. Each model has a 26 set of parameters, the developmental and populational parameters respectively. Both the

8 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 population and developmental model were written in Fortran 90. The data from the simulations 2 was analyzed and visualized using R. In addition, in each generation, the matrix of additive 3 genetic variances and covariances between traits, G, the matrix of phenotypic variances and 4 covariances between traits, P, and the selection differentials, s, are estimated. From that an 5 expected response to selection is calculated from the multivariate breeder’s equation and 6 compared with the one observed in the simulations for the next generation.

7 Developmental model

8 The developmental model used for our simulations is a computational model of tooth 9 development (15). This model provides an example of a genotype–phenotype map for the 10 morphology of a complex organ. The tooth developmental model is a mathematical 11 representation of the current understanding of the basic gene network behind tooth development, 12 and includes the basic cell behaviors (cell division and cell adhesion), cell mechanical 13 interactions and their regulation by gene products known to be involved in the process. The 14 model is mechanistic in the sense that from this hypothesis and some very simple initial 15 conditions- i.e. a flat epithelium representing the initiation of tooth development- the model 16 reproduces how the morphology and patterns of gene expression in 3D change during 17 development until specific adult tooth morphologies are reached. The dynamics of the model are 18 determined by the value of a set of developmental parameters that specify the magnitudes of 19 several biological and physical properties of the cells, tissues and molecules involved in the 20 process such as, proliferation rate, molecular diffusion rates and regulation interactions between 21 genes. The model's output is the three-dimensional position of tooth cells and, thus, tooth 22 morphology. For a more detailed description of the model we refer the reader to the original 23 publication introducing it (15). The tooth developmental model is suitable for our question 24 because it is able to produce realistic population-level phenotypic variation (15). Furthermore, it 25 is based on developmental biology and therefore includes epigenetic biophysical factors, such as 26 extracellular cell signaling and diffusion in space and mechanical interactions between cells, 27 which have been extensively proposed to lead to complex genotype-phenotype maps (28, 29).

28 For our study, measurable morphological traits needed to be defined in the tooth shape to 29 be tracked through the generations of the population. The phenotypic traits under study were 30 defined to be the spatial position of landmarks in the tooth morphology. Landmarks were located 31 in the three tallest cusps arising in each tooth. The three tallest cusps are the first three cusps to 32 be formed during development (30). Fig. S1 shows the location of the three landmarks in an 33 example morphology. Traits 2 and 4 are the x-positions of the posterior and anterior cusp, 34 respectively. Traits 1, 3 and 5 are the heights of the central, posterior and anterior cusps. Height 35 is measured from the cusp to the baseline of the tooth. The source code and executable files for 36 the model can be found in https://github.com/millisan/teeth-evo

9 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Population model

2 The developmental model was embedded in a population model. Table S1 shows the 3 populational parameters. We defined a core set of population parameters and studied deviations 4 from those core values, other population parameter sets. For each each parameter set, 20 5 simulations each of 200 generations were ran using different optima to explore the behavior of 6 the system. 7 The population model considers 4 steps per generation: 1) a mapping between genotypes 8 and developmental parameters; 2) a mapping between these parameters and (the 9 developmental model); 3) a mapping between phenotypes and fitness; 4) reproduction (with 10 recombination) and mutation on the genotypes. By iterating steps 1 to 4 in each generation, we 11 simulated how the genotypes and phenotypes of the population change over generations (Fig. 12 S3). 13 14 1) Mapping between genotypes and developmental parameters: The input of the developmental 15 model are the values of the developmental parameters. Each of these parameters corresponds to a 16 developmentally relevant interaction, but not necessarily to a single gene (see (15)). 17 Each individual’s genotype is diploid with a fixed number of loci. Each allele in a loci 18 has a specific quantitative value, its allelic effect. The developmental parameters values are 19 determined as the sum of the contributions of a fixed number of loci (similar to (31)). We 20 purposely use this very simple yet unrealistic mapping because our focus is on how the part of 21 the developmental dynamics we understand the better (i.e the developmental model) affects 22 phenotypic variation and evolution, without confounding effects. There is no since in 23 the mapping from genes to parameters each loci contributes to the value of one and only one 24 developmental parameter. Pleiotropy from genes to phenotype arises from the developmental 25 dynamics which map parameters to phenotypes. 26 27 2) Mapping between developmental parameters and phenotypes: This is accomplished by 28 running the developmental model, for each combination of parameters corresponding to each 29 individual To determine the trait values, each tooth is first centered so that the main cusp is 30 located at x=0. The z component of each landmark is determined as the distance on the z axis 31 from the landmark to the baseline of the tooth (see Fig. S1; as in (15)) 32

10 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 3) Mapping between phenotypes and fitness: Each simulation had a different optimal 2 morphology. For each population parameter set, a total of 20 optima were chosen (see Fig. S2). 3 Each of the 5 morphological traits was chose to increase or decrease in respect to an initial 4 reference morphology (see the “Initial population” section). This leads to a total of 2⁵ = 32 trait 5 combinations. This number reduces to 20 when removing one of two mirror symmetric 6 combinations. Because we wanted selection to act with the same strength on every trait, the 7 optimum value for each trait was set at an equal distance from its mean value in the starting 8 population. The optimum morphology was then determined by adding or subtracting 3 units to 9 the mean value for the trait in the population at generation 1, depending if selection was upwards 10 or downwards, respectively.

11 In each generation, the fitness of each individual in the population was calculated as a 12 function of the distance of its trait values and the optimal trait values as 13

14

15 16 Where is the vector of trait values for the individual, is the element of that vector, is 17 the element of the vector of optimum trait values and is the fitness function. The steepness 18 of the selection gradient is determined by the parameter (see Table S1). 19 The parents of each individual in a generation were chosen at random from the 20 individuals of the previous generation. For each individual in the previous generation, the 21 probability of being chosen as a parent was equal to its fitness divided by the sum of the fitness 22 of all individuals in the population in that generation. In this way we implement both natural 23 selection and genetic drift. 24 25 4) Reproduction, recombination and mutation: Once the parents are selected, they produce one 26 gamete each by randomly selecting one of the two alleles for each loci with equal probability. 27 The gametes of the parents then fuse to form a diploid genome. In the gamete formation process, 28 there is a per-loci probability of mutation, parameter . Mutation is implemented by adding a 29 random number to the mutating loci, drawn from a normal distribution with a zero mean and 30 standard deviation. We call the mutation strength. 31 32 5) Initial population: The population has fixed size. The initial population is the same in all 33 simulations with the same parameter set. Each locus starts with ten equally frequent alleles with 34 allelic effects drawn from a normal distribution. The mean of such distribution was such that 35 when summing over all loci one obtains the reference developmental parameter values, i.e. the 36 developmental parameter values that reproduce the morphology of the postcanine tooth of the 37 ringed seal (Phoca hispida ladogensis). The variance of the distribution of genetic values was set 38 equal to (see 4 Reproduction, recombination and mutation). These genetic values were then 39 used in simulations under stabilizing selection for 100 generations. Stabilizing selection was 40 performed by using the mean traits of the population at the first generation as optimal 41 morphology (see 3 Mapping between phenotypes and fitness). The resulting population was used 42 in the actual simulation experiments as the initial condition.

43

11 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Quantitative genetic model 2 3 What we call the observed change is the change in trait means in the simulated population 4 between two consecutive generations. This vector is to be compared with the predicted change 5 using the multivariate breeder's equation on the G, P and s estimated from each generation in the 6 simulation, 7 8 9 10 Where is the between-generation change of trait means, G is the matrix of additive genetic 11 variances and covariances, P is the matrix of phenotypic variances and covariances and s is the 12 selection differential. It is known that the parameters of the breeder's equation can change in 13 time, although there is debate regarding how fast this occurs (33-37). To avoid errors arising 14 from outdated estimations, G, P and s were calculated for each generation. Note that , 15 where is the selection gradient i.e. the direct strength of selection acting on each trait. 16 G was estimated in two ways. The first was directly by midparent-offspring regression 17 (2). Animal models were also fitted to the data. In each generation, a nested design was 18 generated from the population. 100 individuals were randomly selected as sires and 200 as dams. 19 Each sire was mated with 2 dams, and 2 offspring were generated from each couple. The nested 20 design allowed to have phenotypic information on full-sibs, half-sibs and parents. The data was 21 fitted with an animal model and restricted maximum likelihood (REML) estimates of additive 22 genetic variance were obtained using the software WOMBAT (38). The animal model used was 23 the simplest possible, 24 25 26 27 With the vector of traits for the individual, the breeding value and the residual term. 28 The variance-covariance matrix for additive genetic values estimated from midparent-offspring 29 regression and the variance-covariance matrix of the traits were used as starting estimates for the 30 random effects. Both methods yielded similar estimates of G. 31 The selection differential s was calculated, in each generation, as the covariance between 32 relative fitness and each trait value (8). 33 34 Statistical Analysis 35 36 Due to the stochastic nature of the evolutionary process, we had to develop ways to reveal the 37 deterministic part of the difference between observed trait change and the trait change predicted 38 from the multivariate breeder’s equations. To do this, we performed repetition simulations. These 39 consist in re-simulating the same generational step multiple times. In each repetition, stochastic 40 processes such as recombination and mutation differ, leading to a stochastic distribution of 41 observed changes, and therefore a distribution of errors. This approach allows to statistically test 42 against the null hypothesis “the mean predicted change equals the mean observed change”. 43 Rejecting the null hypothesis implies that there is a systematic difference between the 44 measurement and the prediction that cannot be attributed to randomness. We refer to such 45 systematic component of the error as bias. 46 For each repetition we can calculate an observed change as the difference in means

12 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 between the original population and the one generated in the repetition. We then compare this 2 distribution of observed changes with the predicted change using the breeder's equation. For all 3 parameter combinations studied, and for all 200 generations of the 20 simulation in each of these 4 combinations, we performed 40 repetitions. This leads to a total of 1760000 simulated 5 generations. 6 Because we are working with a finite population the effect of sampling is not totally 7 eliminated even with this approach. This is why we average the observed and predicted change 8 in a time window of 10 generations. This is equivalent to doing a regression of the response to 9 selection against generations, as typically done in artificial selection experiments (2). We claim 10 there is significant bias when the predicted change deviates from the distribution of observed 11 changes by more than 3 standard errors of the observed mean (this is the 99% confidence in a 12 normal distribution, see Fig. S1). 13 14 15 References and Notes:

16 1. T. Uller, A. P. Moczek, R. A. Watson, P. M. Brakefield, K. N. Laland, Developmental bias

17 and evolution: A regulatory network perspective. Genetics (2018).

18 2. D. S. Falconer, T. F. C. Mackay, Introduction to Quantitative Genetics (Pearson, Fourth

19 Edition,1996).

20 3. D. A. Roff, A centennial celebration for quantitative genetics. Evolution (N. Y). 61, 1017–

21 1032 (2007).

22 4. P. Alberch, Developmental Constraints in Evolutionary Processes. Dahlem Konf. Berl,

23 313–332 (1982).

24 5. I. Salazar-Ciudad, Developmental constraints vs. variational properties: How pattern

25 formation can help to understand evolution and development. J. Exp. Zool. Part B Mol.

26 Dev. Evol. (2006).

27 6. G. B. Müller, Evo-devo: Extending the evolutionary synthesis. Nat. Rev. Genet. (2007).

28 7. P. D. Polly, Developmental Dynamics and G-Matrices: Can Morphometric Spaces be Used

29 to Model Phenotypic Evolution? Evol. Biol. 35, 83–96 (2008).

13 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 8. R. Lande, Quantitative genetic analysis of multivariate evolution, applied to brain:body

2 allometry. Evolution (N. Y). 33, 402–416 (1979).

3 9. Morrissey, M. B., Kruuk, L. E. B., & Wilson, A. J. (2010). The danger of applying the

4 breeder's equation in observational studies of natural populations. Journal of evolutionary

5 biology, 23(11), 2277-2288.

6 10. B. R. Grant, P. R. Grant, Evolution of Darwin’s finches caused by a rare climatic event.

7 Proc. R. Soc. B Biol. Sci. (1993).

8 11. D. A. Roff, D. J. Fairbairn, Predicting correlated responses in natural populations:

9 Changes in JHE activity in the Bermuda population of the sand cricket, Gryllus firmus.

10 Heredity (Edinb). (1999).

11 12. J. M. Cheverud, Quantitative genetics and developmental constraints on evolution by

12 selection. J. Theor. Biol. 110, 155–171 (1984).

13 13. W. G. Hill, M. E. Goddard, P. M. Visscher, T. F. C. Mackay, Data and theory point to

14 mainly additive genetic variance for complex traits. PLoS Genet. (2008).

15 14. T. F. Hansen, Macroevolutionary quantitative genetics? A comment on polly (2008). Evol.

16 Biol. 35, 182–185 (2008).

17 15. I. Salazar-Ciudad, J. Jernvall, A computational model of teeth and the developmental

18 origins of morphological variation. Nature (2010).

19 16. I. Salazar-Ciudad, M. Marín-Riera, Adaptive dynamics under development-based

20 genotype-phenotype maps. Nature (2013).

14 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 17. M. Pigliucci, Genetic variance-covariance matrices: A critique of the evolutionary

2 quantitative genetics research program. Biol. Philos. (2006).

3 18. M. B. Morrissey, Evolutionary quantitative genetics of nonlinear developmental systems.

4 Evolution (N. Y). 69, 2050–2066 (2015).

5 19. S. H. Rice, J. Felsenstein, A general population genetic theory for the evolution of

6 developmental interactions. Proc. Natl. Acad. Sci. (2002).

7 20. M. Turelli, N. H. H. Barton, Dynamics of polygenic characters under selection. Theor.

8 Popul. Biol. 38, 1–57 (1990).

9 21. T. F. Hansen, G. P. Wagner, Modelling genetic architecture: A multilinear theory of gene

10 interaction. Theor. Popul. Biol. (2001).

11 22. I. Salazar-Ciudad, Mechanisms of pattern formation in development and evolution.

12 Development (2003).

13 23. S. Urdy, On the evolution of morphogenetic models: Mechano-chemical interactions and

14 an integrated view of cell differentiation, growth, pattern formation and morphogenesis.

15 Biol. Rev. (2012).

16 24. P. Mitteroecker, The Developmental Basis of Variational Modularity: Insights from

17 Quantitative Genetics, Morphometrics, and Developmental Biology. Evolutionary

18 Biology. (2009).

19 25. P. Beldade, K. Koops, P. M. Brakefield, Developmental constraints versus flexibility in

20 morphological evolution. Nature (2002).

15 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 26. C. E. Allen, P. Beldade, B. J. Zwaan, P. M. Brakefield, Differences in the selection

2 response of serially repeated color pattern characters: Standing variation, development,

3 and evolution. BMC Evol. Biol. (2008).

4 27. E. Hine, K. McGuigan, M. W. Blows, Evolutionary Constraints in High-Dimensional Trait

5 Sets. Am. Nat. (2014).

6 Materials-and-Methods-only references:

7 28. I. Salazar-Ciudad, J. Jernvall, How different types of pattern formation mechanisms affect

8 the evolution of form and development. Evol. Dev. (2004).

9 29. G. Forgacs, S. A. Newman, Biological physics of the developing embryo (Cambridge

10 University Press, 2005).

11 30. J. Jernvall, P. Kettunen, I. Karavanova, L. B. Martin, I. Thesleff, Evidence for the role of

12 the enamel knot as a control center in mammalian tooth cusp formation: Non-dividing

13 cells express growth stimulating Fgf-4 gene. Int. J. Dev. Biol. (1994).

14 31. C. Rolian, “Tinkering with Growth Plates: A Developmental Simulation of Limb Bone

15 Evolution in Hominoids” in Developmental Approaches to Human Evolution (John Wiley

16 & Sons., 2015).

17 32. M. W. Teaford, M. F., Smith, M. M., & Ferguson, Development, Function and Evolution

18 of Teeth (Cambridge University Press, 2007).

19 33. D. Roff, The evolution of the G matrix: Selection or drift? Heredity (Edinb). (2000).

20 34. J. Aguirre, E. Hine, K. Mcguigan, M. Blows, Comparing G: multivariate analysis of

21 genetic variation in multiple populations. Heredity (Edinb). 112, 21–29 (2013).

16 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 35. A. Doroszuk, M. W. Wojewodzic, G. Gort, J. E. Kammenga, Rapid Divergence of Genetic

2 Variance-Covariance Matrix within a Natural Population. Am. Nat. 171, 291–304 (2008).

3 36. T. P. Gosden, S. F. Chenoweth, The evolutionary stability of cross-sex, cross-trait genetic

4 covariances. Evolution (N. Y). (2014).

5 37. A. Penna, D. Melo, S. Bernardi, M. I. Oyarzabal, G. Marroig, The evolution of phenotypic

6 integration: How directional selection reshapes covariation in mice. Evolution (N. Y).

7 (2017)

8 38. K. Meyer, WOMBAT—A tool for mixed model analyses in quantitative genetics by

9 restricted maximum likelihood (REML). J. Zhejiang Univ. Sci. B (2007),

10 doi:10.1631/jzus.2007.B0815.

11 39. M. Turelli, Phenotypic Evolution, Constant Covariances, and the Maintenance of Additive

12 Variance. Evolution (N. Y). (1988).

13 40. A. G. Jones, S. J. Arnold, R. Bürger, Evolution and stability of the G-matrix on a

14 landscape with a moving optimum. Evolution (N. Y). (2004).

15 41. A. B. Gjuvsland, Y. Wang, E. Plahte, S. W. Omholt, Monotonicity is a key feature of

16 genotype-phenotype maps. Front. Genet. 4, 216 (2013).

17 42. B. Hayes, M. E. Goddard, The distribution of the effects of genes affecting quantitative

18 traits in livestock. Genet. Sel. Evol. (2001).

19 43. J. M. Álvarez-Castro, Ö. Carlborg, A unified model for functional and statistical epistasis

20 and its application in quantitative trait loci analysis. Genetics (2007).

17 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Acknowledgments: We thank Tobias Uller, Asko Mäki-Tanila, Pascal Hagolani, Jukka Jernvall,

2 Juha Merilä, Heikki Helänterä, Mihaela Pavcilev, Roland Zimm, Renske Vroomans and Jhon

3 Alves for useful comments. Funding: This research was funded by the Finnish Academy to ISC

4 (315740 and 272280) and ILS to LM. Authors contributions: Funding acquisition and

5 Supervision: ISC, Conceptualization and Writing: ISC and LM. Investigation, Software,

6 Methodology: mostly LM but also ISC, Formal analysis, Validation and Visualization: LM.

7 Competing interests: The authors declare no competing interests. Data and materials

8 availability: All data is available in the manuscript or the supplementary information. Data and

9 computer code used available at https://github.com/millisan/teeth-evo

10

18 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Figure legends

2 3 Fig. 1: Observed and predicted change can differ in a systematic way over generations. The

4 plots show observed (gray line) and predicted change (green and red lines) for trait 2 in one

5 simulation (A) and trait 3 in another simulation (B). Predicted change was calculated using the

6 multivariate breeder’s equation and estimates of G and in each generation. Observed change is

7 shown with a 99% confidence interval generated from repeating each generation 40 times to

8 minimize the stochastic effect of drift, mutation and recombination. Both changes plotted above

9 are averaged in a time window of 10 generations for more robust estimates (2) (See Materials and

10 Methods). The predicted and observed change are considered to differ significantly when the

11 former is outside of the confidence interval of the latter (marked with a line with two asterisks).

12 Teeth representative of the population at five time points are included above the plots, with a

13 black segment indicating the traits plotted. Evolution for the remaining traits for both simulations

14 is shown in Fig. S4. Panel B shows an extreme case in which a cusp is lost during evolution.

15 Fig. 2. Bias is common and it can be large. Bias is the disagreement between the observed and

16 predicted change in traits mean in a generation. (A) Percentage of generations with significant

17 bias at 99% confidence for all simulations, and for each trait (significance calculated as in Fig. 1,

18 see Materials and Methods). Prediction for change in the mean value of trait 1 rarely shows bias

19 (median of 3.98%) while predictions for trait 2 and 4 are commonly biased (median of 30%).

20 Simulations where the populations goes through regions of the GPM where small parametric

21 changes lead to loss of cusps, such as the one shown in Fig. 1B, where not included. Therefore

22 the plots include data from 14 complete simulations and 197 generations each (13790 data points

23 including all 5 traits.) (B) Percentage of generations with at least a given amount of relative bias

24 (observed change minus predicted change, divided by the observed change). Only generations

19 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 with a prediction bias with a 99% confidence were used for this plot (a total of 2409 data points

2 out of the total 13790). When considering all traits, half of the generations with significant bias

3 have a relative bias larger than 46%.

4 Fig. 3: Prediction bias arises when the population is in a nonlinear region of the parameter-

5 phenotype map. The figure shows, for three different generations, the midparent-offspring

6 regression (A, C and E) and a marginal parameter-phenotype (B, D and F) depicting how a trait

7 changes (z-axis) when two specific developmental parameters are varied while keeping all others

8 constant at their population mean value. A and B are for trait 4 in generation 90 of the simulation

9 110 in Fig. S4. C and D are for trait 2 in generation 50 of the simulation in Fig. 1A. Bias can be

10 explained by the marginal parameter-phenotype maps (B-D), showing that small changes in

11 parameter values in this region produces relatively large changes in the measured traits. E and F

12 correspond to trait 3 in generation 100 of simulation 107 in Fig. 1B. The marginal parameter-

13 phenotype map (F) shows that, when both parameters are low, the phenotypes produced lack

14 lateral cusps. Bias arises because the linear approach of multivariate breeder’s equation estimates

15 that teeth with smaller values for trait 3 can be produced when in fact they cannot. For panels A,

16 C and E 300 random individuals are plotted as points.

17 Fig. 4: Population genetic parameter have only a modest effect on the bias. The plots above

18 show the median percentages of generations showing significant prediction bias (99%

19 confidence) for each trait, when modifying one population genetics parameter from the core

20 parameter set, keeping the others to the core value (14 simulations per point in the plot,

21 excluding the ones where there is loss of cusp as shown in Fig. 1B). Parameter values are shown

22 in Table S1. In general, our exploration shows a saturating relationship between the evolutionary

23 parameters and the prediction bias. This is, there is a modest dependency of the bias with the

20 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 parameters but the amount of bias reaches a plateau, determined by the characteristic of the GPM

2 which are not dependent of population-level parameters. A detailed explanation of the effect of

3 each parameter on the bias is given in the Supplementary Information (The effect of population

4 parameters on the bias.)

21 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Figures

3 4 Fig. 1

22 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Fig. 2

23 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15 Fig. 3

24 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 2

3

4

5

6

7

8

9

10

11

12

13

14 Fig. 4

25 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 2 Supplementary information for: 3 Is evolution predictable? Quantitative genetics under complex genotype-phenotype 4 maps 5 Lisandro Milocco, Isaac Salazar-Ciudad 6 [email protected] 7 [email protected] 8 9 11 This PDF file includes: 12 13 I: Supplementary results 14 II: Supplementary discussion 15 III: Table S1 16 IV: Figs. S1-S12 17 V: Supplementary references 18 19 Other supplementary materials for this manuscript include the following: 20 Movies S1-S2 21 22 23 I: Supplementary results 24 25 To further confirm that prediction bias arises from the nonlinearities of the parameter- 26 genotype map, we performed evolutionary simulations where the GPM was defined by 27 simple mathematical explicit functions. We compare a linear and a nonlinear function. 28 We tested the predictions of the multivariate breeder’s equation for both functions while 29 keeping all other evolutionary parameters constant and equal to the core parameter set 30 (see Materials and Methods). 31 For both mappings, 4 developmental parameters were used and 2 traits were defined. 32 Traits are symbolized as and developmental parameters as (following the notation 33 from Rice, 2002). 34 The linear map used was: 35 36

37

39 The nonlinear map used was: 40 41

42 43 44 Matrices and were randomly generated from a normal distribution with mean 0 and 45 variance 1. The exact values of these matrices had no effect on our results. The

1 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 evolutionary algorithm is exactly the same as described in the Materials and Methods 2 section, with only the GPM being different. Briefly, each individual’s genotype is diploid 3 with a fixed number of loci, and is stored as a list of genetic values (see Materials and 4 Methods). The developmental parameter values are calculated as the sum of a fixed 5 number of loci that contribute to the parameter. Each locus starts with ten equally 6 frequent alleles with allelic effects drawn from a normal distribution with mean of 0 and 7 variance equal to the per-loci mutational variance. These genetic values were then used in 8 simulations under stabilizing selection for 100 generations. The genotypes after those 9 generations were used as initial conditions to the actual simulation runs. 10 A total of 30 simulations with random optima were ran for each map, linear and 11 nonlinear, for 200 generations. Every generations, prediction bias was estimated by re- 12 running the same generational step 40 times, allowing to test against the null hypothesis 13 “the mean observed change equals the predicted change”. 14 The percentage of generations that showed bias with 99% significance are shown 15 in Fig. S7A. All optima and generations were used for the plot. For the linear traits, a 16 median of 1.57% of generations showed bias. This is negligible and can be attributed to 17 estimations error of the G matrix due to finite population size, a phenomenon known as 18 the Turelli effect (Turelli, 1988; Jones et al., 2004). Simulations with the nonlinear map 19 on the other hand show a median of 32% of generations with significant bias. 20 Fig. S7B shows the relationship between local nonlinearity of the parameter- 21 phenotype map and the observed bias, analogous to that shown in Extended Data Fig 6. 22 Only data from simulations using the nonlinear map were included in this plot since 23 simulations with linear map gave nonlinearities of the order of , with median error 24 of . As in Fig. S6, it can be seen that both the spread and magnitude of the 25 bias increase with the nonlinearity of the parameter-phenotype map. Note that this 26 measure of nonlineairty includes both monotonous and nonmonotonous parameter- 27 phenotype maps, sensu Gjuvsland and collaborators (Gjuvsland et al, 2013a). 28 29 II: Supplementary discussion 30 31 In this supplement we discuss how our work relates to previous studies and how our 32 approach and findings are different. Previous research includes general criticisms to the 33 multivariate breeder’s equation approach, models considering the inadequacy of the 34 infinitesimal model, the non-normality of trait distributions, epistasis and nonlinear 35 GPMs, or a combination of these issues. 36 One great advantage of the multivariate breeder’s equation is that it can be used 37 for any phenotype without requiring any knowledge of its genetics or development. Only 38 selection, the trait values in individuals and the genetic relatedness between individuals in 39 a population need to be known. This simplicity, however, comes at the cost of 40 assumptions that may often not hold up. 41 42 General criticisms: 43 44 The multivariate breeder’s equation has been found to be inaccurate in some of the few 45 phenotypes in which observed and predicted changes have been compared (see Roff 2007 46 for a review). Inaccurate measurement of either selection or genetic variance, changing

2 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 environmental conditions, or an incomplete picture of all traits under selection can lead to 2 large inaccuracies in the predictions provided by the multivariate breeder’s equation 3 (Merilä et al. 2001; Barton & Turelli 1989, Hadfield 2008, Wilson et al. 2006; Morrisey 4 et al. 2010). 5 From another perspective, Pigliucci (2006) points out technical problems such as 6 the arbitrariness in defining the traits included in the G matrix and the artificiality in the 7 breeding programs commonly used to estimate G, usually carried out with lab-maintained 8 populations outside their natural conditions. The author also criticizes the G-matrix 9 research program on conceptual grounds, such as the exclusively local value of G (only 10 valid for a given population at a given time point), the highly degenerate mapping 11 between G-matrices and underlying genetic and developmental structures, and the 12 difficulty of discriminating the effects of selection and drift from G structure. 13 Much work in developmental evolutionary biology is implicitly or explicitly 14 critical with quantitative genetics in general and with the breeder’s equations in particular 15 (Alberch, 1982; Müller, 2007; Müller, 2017). These critiques are rooted in the fact that 16 the study of development has consistently shown that the GPM is rather complex and 17 nonlinear (Alberch, 1982; Polly, 2008). From this perspective the linear approach of 18 breeder’s equations is seen as a gross oversimplification, especially at the macro- 19 evolutionary scale. 20 Polly (2008) points out three key consequences that developmental interactions 21 can have on morphological evolution and that evolutionary quantitative genetics cannot 22 fully account for. We found evidence of all three points in our simulations. Firstly, even 23 with continuous variation at the level of underlying developmental parameters, traits may 24 occasionally exhibit small jumps in evolution (see the jump in the position of the 25 posterior cusp in Fig. 1A.). It is important to notice, as we explain in the main text 26 discussion, that these jumps are rather small, of at most 10% of the trait value, and, thus, 27 may not be perceived as trait discontinuities, especially given the environmental effects 28 and measurement errors always present in experiments. Secondly, Polly argues that 29 developmental interactions may limit production of certain phenotypic variants in non- 30 additive ways, leading to a bias in the prediction of the response to selection. This is 31 mirrored in our findings of the prediction bias in certain regions of the GPM. We 32 explicitly show that the structure of additive genetic variances and covariances in the G- 33 matrix does not always summarize all associations between traits, leading to the bias in 34 the prediction. Lastly, Polly mentions that some nonlinearities will lead to loss or 35 appearance of structures. Evidence for this can also be found in our simulations and is 36 associated with problems in the application of the breeder's equations (see Fig. 1B and 37 associated problems when the population is in a region of the GPM where a small change 38 in parameter values leads to loss of the posterior cusp). 39 40 The infinitesimal model: 41 42 One founding assumption of quantitative genetics is the infinitesimal model 43 (Fisher, 1918). According to this model, trait values are determined by a large number of 44 loci of small and fixed effect. By fixed effect we mean an effect that is independent of 45 that of other loci. This assumption does not seem to hold for the traits for which we know 46 something about the underlying loci. First, there is plenty of evidence for alleles of

3 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 relatively large effect and, in fact, for a diversity of magnitudes of effects (Alberch, 1982; 2 Gibson, 1999; Eyre-Walker and Keightley, 2007). Second, as explained through this 3 article and elsewhere, everything we know about development indicates that genes can 4 construct the phenotype and, thus, contribute to their variation because they interact with 5 each other (Waddington, 1957, Wright, 1978, Alberch, 1982, Rice, 2002; Salazar-Ciudad 6 2006; Uller et al. 2018). This strongly suggests that the effects of loci are not fixed, but 7 depend on each other, i.e they are epistatic. 8 In the context of quantitative and population genetics, epistasis between a set of 9 loci is only relevant if these loci are polymorphic. In other words, only variable loci can 10 contribute to trait variation and, thus, have a role in the prediction of the response to 11 selection. When this occurs there is statistical epistasis (Cheverud and Routman 1995, 12 Álvarez-Castro and Carlborg 2007, Hansen 2008, 2013). That two gene products interact, 13 for example, during development, is called functional epistasis (Álvarez-Castro and 14 Carlborg 2007, Hansen 2013 and physiological in Cheverud and Routman 1995). That 15 two genes interact during development does not imply that they are statistically epistatic 16 to each other since they may not be polymorphic. This fact has been used (Hill et al., 17 2008; Hansen, 2008) to argue that the understanding of how genes interact during 18 development, and the resulting nonlinear GPM, is not necessarily relevant for 19 quantitative genetics. This has been taken to imply that, given the general success of 20 quantitative genetics, the infinitesimal model is a good approximation (Hill et al., 2008; 21 Turelli and Barton 1994). As argued in the main text, however, this success is only 22 evident for the case of single traits. In addition, since the vast majority of genes interact 23 with many other genes and allelic variation is common (Wright, 1978), one should expect 24 epistasis to be relevant for the quantitative genetics of most traits. In other words, 25 functional epistasis should often lead to statistical epistasis. This implies that the effects 26 of most loci would not be fixed and, that thus, the infinitesimal model would be hardly 27 tenable. Others, however, argue that either functional epistasis is not so prevalent or that 28 loci are not so polymorphic to make statistical epistasis relevant enough to invalidate the 29 infinitesimal model (Hill et al., 2008). Overall there is a long-lasting controversy about 30 the infinitesimal model and epistasis. The emerging consensus view seems to be that the 31 infinitesimal model is, at best, a convenient idealization that may roughly hold in a 32 number of situations but not in others (Huang and Mackay, 2016). The idealized nature of 33 the infinitesimal model could underlie the so-called missing problem. This is 34 basically the empirical observation that the total sum of the single-locus variation of 35 alleles affecting a given trait accounts for only a small fraction of the total heritability of 36 the trait (Zuk et al., 2012). 37 Strictly speaking the infinitesimal model is not required for some of the inferences 38 made in quantitative genetics. The infinitesimal model was central to quantitative 39 genetics because, originally, quantitative genetics was seen as an extension of multilocus 40 population genetics. Continuous traits were then assumed to be affected by many loci, 41 each of small, fixed and additive effect. This allowed to define things such as additive 42 genetic variance and heritability based on loci effects. This is not possible, however, if the 43 infinitesimal model does not hold. In this case, one can still do quantitative genetics, 44 simply one should treat quantitative genetics as a first order approximation to phenotypic 45 change based on the parent-offspring regression (Rice 2004, 2012). In this case the 46 breeders equations should be interpreted as the linear prediction one can cast on the

4 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 response to selection based on the regression between parents and offspring (or based on 2 covariances in the case of the multivariate breeder’s equation). 3 4 Models with epistasis: 5 6 There have been attempts to develop quantitative genetic frameworks that can 7 accommodate for certain types of epistasis. Hansen and Wagner (2001) introduced a 8 multilinear model in which the effect of an allele substitution in a locus is modelled as a 9 linear combination of locus effects and pairwise epistatic effects among loci. Using this 10 description, the authors develop expressions for usual quantitative genetics quantities 11 such as additive genetic variance, and most relevant to us, a modification of the breeder’s 12 equation that takes into account epistasis. Carter et al 2005 completed this analytical 13 work with simulation studies using the multilinear model, and showed that directional 14 epistasis modifies the response to selection from that expected by the breeder’s equation. 15 A distinctive feature of our work is that we simulate each individual in the 16 population using a mechanistic model of development. The dynamics of this model 17 cannot be framed in terms of the multilinear model. The limiting assumption in the 18 multilinear model is that changing genetic background of a locus causes a scaling of the 19 effects of all substitutions in that locus by the same scale factor. This is not the general 20 case when using the tooth model, as shown in Exteded Data Fig. 9. The equations 21 developed in Hansen and Wagner (2001) for predicting the response to selection are 22 therefore not applicable to our system or to any GPM without this specific scaling. 23 Our model does not have the limitation of the multilinear model but it is not 24 analytically tractable. Indeed, while working in the multilinear model, Carter et al (2005) 25 hints towards the importance of the work we are presenting here to the understanding of 26 the effect of nonlinear interactions on the response to selection: “To study truly nonlinear 27 forms of gene interaction, we have to turn to highly specific architectures, and rely almost 28 exclusively on computer simulations.” 29 30 Non-normal distributions: 31 32 Much theoretical work has been done to study the effects of the departure from the 33 assumption of normality in the distribution of traits. Among the first to do so were Turelli 34 and Barton (1990, 1994). Most notably, they derived an expression for the deviation in 35 the expected response to selection when breeding values are not normally distributed, 36 under the assumption of additivity among loci (i.e. each loci contributing a fixed amount 37 to the final phenotype, so that the phenotype is the sum of each individual loci’s 38 contribution). More recently, Bonamour 2017 used the equations derived in Turelli and 39 Barton 1990 and computer simulations to study the effect of skewness (asymmetry) in 40 phenotypic distributions on the measurements of selection and on the response to 41 selection. 42 Turelli and Barton 1994 conclude that the infinitesimal model is robust to 43 deviations from normality. The conclusion is, however, only valid under the assumption 44 of additivity. Departure from normality in phenotypic distribution in their analysis 45 therefore does not arise from nonlinear genetic interactions but other effects such as 46 environment, recombination and . Their approach is, thus,

5 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 fundamentally different to the one presented in this paper, where the focus is in the GPM 2 itself. 3 4 Models with nonlinearities: 5 6 Rice’s work (2004) shows that breeder’s equations assumes the local linearity of GPM 7 and that the predictions of the multivariate breeder’s equation could be inaccurate under 8 complex nonlinear GPMs. This work demonstrates that this could happen, but does not 9 include an understanding of actual GPMs from which to evaluate how large and frequent 10 should these inaccuracies be. 11 Rice (2004) developed a mathematical population framework to study the 12 dynamics of adaptation under arbitrary GPMs, including nonlinear ones. This framework 13 requires no assumptions about the interactions between loci nor about trait distributions. 14 This framework, however, assumes that the GPM is known. Based on the derivatives of 15 the traits with respect to some underlying factors (e.g. genes or developmental 16 parameters) and the moments of the trait distribution, Rice develops formulae to predict 17 the response to selection. 18 Morrissey and collaborators (Morrissey 2015, de Villemereuil 2016) use 19 mathematical functions to map normally distributed genetic and environmental factors to 20 traits, similarly to Rice. Classical quantitative genetics can be applied at the level of the 21 underlying factors and then the results can be translated back to the traits by using the 22 functions. This allows to estimate classical parameters such as additive genetic variance 23 of the traits. The authors also developed a predictive equation for the response to 24 selection for nonlinear GPMs. As in the case of Rice work this mathematical description 25 of the GPM does not arise from the theory itself but needs to be provided from elsewhere. 26 Our approach is intrinsically different: it has a theory on how the GPM is and then 27 explores its evolutionary consequences. It is important to notice that our theory of the 28 GPM is coming from a theory of development and, thus, it is based on a completely 29 different sort of evidence and logic than Rice’s work, models with epistasis and 30 quantitative genetics in general. 31 In addition, the function used Morrissey’s work is assumed to be invertible. This 32 restricts the application of the method to one-to-one mappings between genetic and 33 phenotypic variation. This is problematic since most GPMs are known to exhibit many- 34 to-one mappings (Wagner, 2011), including the one of the tooth model (Salazar-Ciudad 35 and Marin-Riera, 2013). 36 Using Morrissey’s approach, De Villemereuil (2016) shows that systematic biases 37 in the predictions of the breeder’s equation can be encountered when the mapping 38 function used is logarithmic. This prediction bias is similar to the one we describe in this 39 work. In here, however, we do not impose a function mapping parameters to phenotypes. 40 Rather, the mapping arises from the dynamics of the developmental model and changes 41 with time. Because the mapping from inputs to traits changes with time, we observe 42 dynamics in which prediction bias changes both quantitatively and qualitatively, even 43 finding regions where there is no prediction error, despite the developmental function 44 being nonlinear (see Exteded Data Figure 6, where high nonlinearities can be associated 45 with negligible bias.)

6 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 The approaches just described do not aim to understand the GPM, they aim to 2 understand how the GPM may affect trait adaptation. In these approaches the GPM is 3 given as an explicitly defined function between genotype, or other underlying factors, and 4 traits. Many mathematical approaches that aim to understand the GPM use a quite 5 different approach that models the network of basic gene product and cell interactions 6 involved in the production of the phenotype, such as in development or physiology. As a 7 result of this modelling of the “microscopic” rules a specific relationship between genetic 8 and trait variation arises, a GPM. 9 Gjuvsland and collaborators (Gjuvsland 2011, Gjuvsland 2013a, Gjuvsland 10 2013b) take one such alternative approach. They study the adequacy and predictive value 11 of some central concepts in quantitative genetics in the context of a realistic nonlinear 12 GPM. In that sense their approach is similar to ours, they simply address a different set of 13 questions and use simpler models based on physiology, rather than on development, and 14 deal with single traits. As in the current work, they emphasize the importance of 15 describing the GPM as arising from dynamic physiological systems. They found that 16 gene networks with and without feedback motifs can exhibit differences at the level of 17 how trait variance is partitioned and at the level of the number of apparent loci 18 underlying trait variation. Although all gene network motifs were found to lead to 19 statistical epistasis, positive feedback were found to do it much more that any other 20 network motif. In addition, positive feedbacks and cyclic dynamics gave more non- 21 monotone GPMs (i.e. maps where order is not preserved between allele content and 22 corresponding genotypic value) and much lower than those without. 23 In a recent study by Van Dooren (2018) found no prediction bias when applying 24 the breeders equations to two traits in the eyespots of the butterfly. The currently 25 accepted hypothesis for eyespot development is based on a gradient model for the 26 concentration of a diffusible signal (Dilão and Sainhas, 2004). Cells in the center of the 27 eyespot secrete a morphogen; the color and size of the eyespot is determined by 28 thresholds to the concentration of the morphogen. The formation of the eyespot as we 29 understand it can therefore be reduced to a one-dimensional process, without major cell 30 rearrangements co-occurring with the patterning. This process is therefore unlikely to 31 produce nonlinearities as large as the ones we find for cusp patterning in the tooth 32 development model. 33 34 How important are GPM nonlinearities in a population context? 35 36 It has been proposed that nonlinearities in the GPM are not relevant, at least in the short 37 term. Quoting (Hansen 2008): “Sometimes biologists without statistical training will 38 dismiss the additive model with the argument that genes exert their effects through highly 39 nonlinear physiological interactions. It is almost guaranteed, however, that the additive 40 model will be a good local approximation to the GPM for segregating genotypes. This is 41 due to the statistical definition of gene effects as averages over the genotypic 42 combinations in which they occur. Even if the effect of a particular gene substitution may 43 be different in different specific genetic backgrounds, the averaging over all backgrounds 44 minimizes the variation.”. Although it is true that such minimization occurs, there is no 45 reason to assume it would necessarily be strong enough to make the nonlinearity of the 46 GPM irrelevant for short term predictions. This depends on how strong such

7 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 nonlinearities are and, to a lesser extent, on the population size and recombination rates 2 (see section The effect of populaton parameters on the bias below). In fact, our results 3 explicitly show that this is not the case, even with large populations in linkage 4 equilibrium (see Fig. S12). The GPM is often nonlinear enough to lead to significant 5 deviations between observed changes and predicted changes in the short term. 6 To understand, from a quantitative genetics perspective, why breeder’s equations 7 may still fail despite the averaging over genetic backgrounds, we must distinguish the 8 functional, individual-level description of the GPM and the statistical, population-level 9 one used in quantitative genetics. The marginal parameter-phenotype maps shown in our 10 work (Fig. 3) are made at the individual level: interactions at the molecular, cellular and 11 tissue level that result in individual variation. It is independent of the allelic variants 12 present in the population, and purely determined by the developmental and genetic 13 system behind the development model. The parameter-phenotype map at the population 14 level is evidently related to the individual-level one (Alvarez Castro and Carlborg 2007), 15 but it is weighted by the allelic frequencies existing in the population. At the population 16 level, the map describes how a change in parametric values result in an average change in 17 the mean phenotypic value in the population. Since in our model the mapping between 18 loci and developmental parameter values is additive, we can assume that the parameter 19 values are distributed normally in the population. The population-level parameter- 20 phenotype map can then be thought of as the individual-level map passed through a 21 Gaussian filter, as shown in Fig. S12. Such smoothing reduces the ruggedness compared 22 to the individual-level map, and does so depending on the spread of the parameter values 23 in the population (a proxy for allelic distributions). This population-level smoothing can 24 be enough for the map to be adequately described locally by a plane, leading to accurate 25 description by multivariate breeder’s equation. This will be the case if the individual- 26 level map is already linear (as in Fig. S12A) and even sometimes when it is nonlinear (as 27 in Fig. S12C). As we show in this work however, the smoothing may not be enough for a 28 relatively high percentage of generations when large nonlinearities at the individual level 29 result in large nonlinearities at the population level (as in Fig. S12E). In terms of average 30 genetic effects, this means that certain nonlinearities at the individual-level will result in 31 gene substitutions with effects that are highly dependent on the genetic background. For a 32 given gene substitution we can think of an associated distribution of genetic effects across 33 the different backgrounds. The shape of this distribution will depend on both the 34 substitution itself and the genetic backgrounds present in the population. When reducing 35 this distribution of effects to its mean value – as when calculating the average genetic 36 effect – we end up with a statistic that poorly represents the distribution of effects. This 37 ultimately leads to predictions based on additive effects being inaccurate, even for single- 38 generation predictions. 39 40 The effect of populaton parameters on the bias: 41 42 Prediction bias was found across all population genetic parameter sets studied (see 43 Materials and Methods and Fig. 4). In general, our exploration shows a saturating 44 relationship between the evolutionary parameters and the prediction bias. This is, there is 45 some dependency of the bias with the parameters but the amount of bias reaches a 46 plateau, determined by the characteristic of the GPM which are not dependent of

8 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 population-level parameters. The characteristics of the GPM are determined by 2 development only, the population genetics parameters affect bias indirectly by affecting 3 how fast the population moves in the developmental parameter space and, thus, the 4 likelihood of finding a region of such space with a complex GPM. 5 The effect of selection strength was studied by deviating the factor in the 6 exponent of the selection function. In each generation, the fitness of each individual in 7 the population was calculated as a function of the distance of its trait values and the 8 optimal trait values as 9

10

11 12 Where z is the vector of trait values for the individual, is the element of that 13 vector, is the element of the vector of optimum trait values and is the fitness 14 function. Therefore, the closer an individual is to the optimum morphology, the higher its 15 fitness. The steepness of the selection gradient is determined by the parameter (see 16 Materials and Methods). Stronger selection makes the population move faster in the 17 developmental parameter space. This allows for it to explore larger areas of the 18 developmental parameter space and therefore increasing the chances of passing through 19 highly nonlinear regions of it. 20 Mutation rate is the per-allele probability of being mutated. Larger mutation rates 21 lead to larger percentages of significant bias. This is explained by the fact that more 22 lead to larger exploration of the GP map and bigger probability of reaching 23 problematic regions. The core and low mutation rates do not show large differences. 24 Mutation strength determines how large is the phenotypic effect of a mutation in 25 respect to a reference value. Tinkering this parameter leads to a a similar plateau-like 26 response in the amount of significant bias as seen when tinkering the mutation rate. 27 Larger populations show larger percentage of significant bias. This can be 28 explained by two facts. First, larger populations explore larger proportions of the 29 developmental parameter space, and therefore they have larger chances of encountering 30 nonlinear regions. On the other hand, smaller populations experiment larger drift effects 31 and it therefore becomes harder to confidently say that the prediction is different to the 32 observed change in our repetition experiments. In this latter case, we detect bias less 33 often because there is less statistical power to detect it. 34 Finally, there is not a clear dependency between the amount of bias and the 35 number of loci additively determining each parameter. This is due to two counteracting 36 effects. On the one hand, with a fixed per-allele mutational rate, more loci imply more 37 mutations per genome. On the other hand, since more loci imply that each of them 38 contributes less to the parameter value, fixed mutational strength means each mutation is 39 smaller. This means that when increasing the number of loci determining each parameter, 40 there are more mutations of smaller effect, and eventually to no big changes in the 41 amount of bias detected.

9 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 III. Supplementary tables 2 Population size (N_pob) 100 600 1000

Number of haploid loci (N_loci) 42 105 210

Mutation standard deviation (tau) 0.02 0.1 0.5

Allelic mutation rate (muta) 0.01 0.001 0.0001

Selection strength (alpha) 0.125 0.05 0.01 3

4 Table S1. Parameter values for the simulations. The middle columns shows the core 5 population parameter set. Simulations with all optima were ran by changing one of the 6 population parameters above at a time. This leads to a total of 11 population parameter 7 combinations ran, and a total of 220 simulations.

10 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 IV. Supplementary Figures

3 Fig. S1: Example 3D morphology arising from the developmental model. In the left we 4 mark (black balls) the landmarks used to define the traits (right) on which natural 5 selection is applied. The height, y-coordinate of the landmarks, is calculated as the 6 difference from the mean y position of the cells in the tooth margin (as in 16). 7

11 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 2 3 Fig. S2: List of optima used in the simulations. Optima are shown as deviations from the 4 initial morphology, with arrows symbolizing the direction of selection for each of the 5 three cusps. Each of the 5 traits defined in Fig. S1 can be selected to increase or decrease, 6 leading to a total of 2⁵ = 32 trait combinations. We have removed the ones are mirror 7 images of others. Because we wanted selection to act with the same strength on every 8 trait, the optimum value for each trait was set at an equal distance from its mean value in 9 the starting population. The optimum morphology was then determined by adding or 10 subtracting 3 units to the mean value for the trait in the population at generation 1, 11 depending if selection was upwards or downwards, respectively. 12 13 14 15

12 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 2 3

6 7 Fig. S3: Evolutionary algorithm. The scheme shows the steps of the population model 8 algorithm used in our simulations. Each step is explained in the text. 9

13 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

2 3 Fig. S4 (CONT.)

14 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

2 3 Fig. S4 (CONT.)

15 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

2 3 4 Fig. S4 (CONT.) 5

16 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

2 3 4 5 Fig. S4: Observed and predicted change can differ in a systematic way over 6 generations.As figure 1 but for all simulations with the 20 different optima in the core 7 parameter set (see Table S1).

17 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

2 Fig. S5: Prediction bias arises when the population is in a nonlinear region of the 3 parameter-phenotype map. As Fig. 3 but for other examples. 4

18 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Fig. S6. Prediction bias correlates with the local non-linearity of the genotype- 21 phenotype map. Absolute bias versus local nonlinearity of the parameter-phenotype map 22 for all simulated generations showing a significant bias (see Fig. 2). Both the 23 frequencyspread and magnitude of the bias increase with the nonlinearity of the 24 parameter-phenotype map. Each point represents the bias and nonlinearity in a trait, in a 25 generation of a simulation in the core set. Points were grouped into bins and a boxplot 26 was made for each bin. Local nonlinearity was measured based on how well a linear map 27 describes the relationship between parameters and traits. A best linear fit (B from the 28 vector of mean developmental parameters in the population (u) to the vector of observed 29 traits (z) is found by least squares, using the data from all individuals in the population in 30 that generation: Where T is the transpose operator. Both phenotype and 31 trait values are expressed as deviations from the mean. The root-mean-square error of the 32 fit is taken as a measure of nonlinearity for each trait, i.e. 0 means the map is perfectly 33 described by the linear fit.

34 35 36

19 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

2 3 Fig. S7: Bias is common for any non-linear GPM. The plots shows the percentage of 4 generations with significant bias and a binned scatter-plot of local non-linearity of the 5 parameter-phenotype map against prediction bias for two analytical GPMs. A shows the 6 distribution of percentage of generations with significant bias (99%) for all 30 7 simulations ousing the linear and the nonlinear GPMs. When the GPM is linear there is 8 negligible amount of detected bias (median of 1.57% of generations). When the GPM is 9 nonlinear, prediction bias is very common. B shows, for the non-linear map, the absolute 10 prediction bias against the local nonlinearity of the parameter-phenotype map as shown in 11 Fig. S6. Both the spread and mean amount of bias increases the nonlinearity of the 12 parameter-phenotype map. 13

20 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 2 3 4

5 Fig. S8: The relationship between parents and offspring trait values is not always 6 linear, the population is distributed over a region of the parameter space that leads 7 to different trait values in a non-gradual way: Marginal parameter-phenotype maps 8 including 100 individuals in the population as black points. Individuals do not always fall 9 on the surface of the plot because of variation in parameters that have not been plotted 10 (that is why we call these maps marginal). 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

21 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 2

22 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

2 Figure S9: Bias can arise because variation does not exist beyond a trait value but 3 the linear transformation in the breeder’s equations infers that such variation 4 should exist. Idealized univariate example of how a limitation in a trait value leads to 5 bias in the breeder’s predictions. An actual example can be seen in the dashed line of 6 Figure 3G. This example allows to more clearly explain one way in which bias arises. A 7 and B show the parent-offspring regression of a trait (although the same could occur 8 between traits). The dashed vertical and horizontal red lines show the value over which a 9 trait can not pass due to developmental dynamics. Simply the population is in a region of 10 the developmental parameter space where this trait is always below this value in all 11 possible phenotypes. Regions of that space that are far away from where the population is 12 now may be able to produce larger values in this trait. The green line shows the 13 regression between parents and offspring. From this regression one would infer that trait 14 values over the dashed vertical red line should be possible while they are not, red 15 ellipsoid. B shows the same that A but for a population that has encountered this bias 16 long ago and in which natural selection favours large values in the trait. Natural selection 17 pushes the population towards the dashed vertical red line but due to the always present 18 mutation and recombination there are always individuals with lower values of the trait. 19 Because of that not all individuals are on the limit of the possible values (dashed vertical 20 red line) and a non-zero regression exist. Because of this regression breeder’s prediction 21 is still that the trait should increase beyond the non-possible trait value, thus, the bias. 22

23 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 Fig. S10: The tooth model is not compatible with the assumptions of the multilinear 2 model The multilinear model developed by Hansen and Wagner21 assumes that changing 3 the genetic background for a locus causes a linear transformation of the effects of all 4 substitutions in that locus. The figure shows an example where this does not hold using 5 the tooth model: two allelic substitutions are scaled by different factors when changing 6 the genetic backgrounds. In the example we measure trait 2, the x-position of the 7 posterior cusp. We quantify the difference in the value of this trait when making two 8 allele substitution, with respect to the phenotype of the reference allele (A1). Included in 9 the plot are outlines of the teeth morphologies. Outlines in black correspond to the 10 reference morphologies. Colored outlines correspond to the morphologies obtained after 11 the allelic substitutions, in the two backgrounds. A black segment indicates the difference 12 in the trait value between the reference morphology and the one produced after the allelic 13 substitution, which is also the y-axis in the plot. 14 The allelic substitution A1->A2 has almost the same phenotypic effect in both 15 backgrounds. The substitution of A1->A3 however, produces a much larger effect in 16 Background 1 than in Background 2, violating the assumptions of the multilinear model. 17 We denote as the genetic effect of substituting allele with allele on 18 genetic background . For a reference allele and for two genetic backgrounds

24 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 and , the multilinear model assumes that there exists a linear transformation such 2 that for every allele , 3 4 Equivalently, it holds that for every ,

5

6 Assuming 7

8 In the example, this does not hold, 9 10

12 13

25 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

2 Fig. S11. The effect size of mutations falls within a realistic range. Distribution of the 3 effect size of mutation on traits relative to trait’s standard deviation (for all 20 simulations 4 in the core parameter set and sampled every 20 generations, starting from generation 10 5 to generation 190). First, the average value for each loci was calculated by averaging over 6 the values present in the population. This results in a list of mean loci values, for all loci 7 in the population. This list corresponds to the “reference individual”, and its 8 corresponding phenotype is calculated by running the tooth development model. From 9 the genotype of the reference individual, each loci is mutated, one at a time, with a 10 mutation strength equal to the value in the core parameter set. This creates a set of single- 11 loci mutants. The phenotype of each mutant is generated using the tooth development 12 model. Finally, the deviation from each mutant to the reference individual is measured for 13 each trait, and divided by the standard deviation for that trait in the population. Panel A 14 shows the distribution of these effects for all traits, including all mutants generated for all 15 simulation and generations sampled. The mean effect is shown with a red line, and equals 16 0.04 standard deviations. Because most effects are very small, larger effects cannot be 17 seen from the plot. Panel B shows only the effects that are larger than 0.3 sd, which can 18 be considered large effects. The distribution found is compatible with others previously 19 reported QTL distributions , so the mutation strength in the core parameter set is not 20 unrealistic. Note that our estimate for mean effect is close to 0 because we can detect 21 even the smallest effects, which is impossible to do with real-life data. The data fits the 22 expectation of an exponential-like distribution of effects sizes.

26 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 2 3

4 Fig. S12: Individual- and population-level parameter-phenotype maps. Individual- 5 level parameter-phenotype maps (A, C and E) were constructed to see the phenotypic 6 effects of changing two parameters while keeping all the others constant. Here, all-but- 7 two parameters were fixed to their mean value in the population at the generation plotted. 8 The remaining parameter was varied around their mean values and within the limits of 9 values existing in the population. The trait values of the resulting morphologies were 10 measured and plotted as the z axis. Population-level maps (B, D and F) were constructed 11 by running a Gaussian smoother through the individual-level map. These maps therefore 12 indicate the average trait value for an average parameter value. We assume a Gaussian 13 distribution of parameter values in the population. The plot shows that if the individual- 14 level map is linear (A), the population-level one will be as well (B). If the individual- 15 level map is nonlinear however (C and E), the population-level map may or may not 16 show significant nonlinearities (D and F), depending on the magnitude of the individual- 17 level nonlinearities and the spread of the parameters in the population.

27 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 V. Supplementary references: 2 3 (Few references are repeated from the main text to make the Supplementary 4 Information self-contained) 5 6 Alberch, Developmental Constraints in Evolutionary Processes. Dahlem Konf. Berl, 313– 7 332 (1982). 8 9 Álvarez-Castro, J. M., & Carlborg, Ö. (2007). A unified model for functional and 10 statistical epistasis and its application in QTL analysis. Genetics. 11 12 Barton, N. H., & Turelli, M. (1989). Evolutionary quantitative genetics: how little do we 13 know?. Annual review of genetics, 23(1), 337-370. 14 15 Bonamour, S., Teplitsky, C., Charmantier, A., Crochet, P. A., & Chevin, L. M. (2017). 16 Selection on skewed characters and the paradox of stasis. Evolution, 71(11), 2703-2713. 17 18 Carter AJ, Hermisson J, Hansen TF. The role of epistatic gene interactions in the response 19 to selection and the evolution of . Theor Popul Biol. 2005 Nov;68(3):179-96. 20 21 Cheverud, J. M., & Routman, E. J. (1995). Epistasis and its contribution to genetic 22 variance components. Genetics, 139(3), 1455-1461. 23 24 de Villemereuil, P., Schielzeth, H., Nakagawa, S., & Morrissey, M. (2016). General 25 methods for evolutionary quantitative genetic inference from generalised mixed 26 models. Genetics, genetics-115. 27 Dilão, R., & Sainhas, J. (2004). Modelling butterfly wing eyespot patterns. Proceedings of the Royal Society of London. Series B: Biological Sciences, 271(1548), 1565-1569.

Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nat Rev Genet. 2007 Aug;8(8):610-8. 28 29 Forgacs,G. and Newman,S. (2005) Biological Physics of the Developing Embryo. 30 Cambridge University Press, Cambridge, UK. 31 32 Gibson G. 1999. Insect evolution: redesigning the fruitfly. Curr Biol 9:R86–R89. 33 34 Gjuvsland, A. B., Vik, J. O., Woolliams, J. A., & Omholt, S. W. (2011). Order‐preserving 35 principles underlying genotype–phenotype maps ensure high additive proportions of 36 genetic variance. Journal of evolutionary biology, 24(10), 2269-2279. 37 38 Gjuvsland, A. B., Wang, Y., Plahte, E., & Omholt, S. W. (2013a). Monotonicity is a key 39 feature of genotype-phenotype maps. Frontiers in genetics, 4, 216. 40

28 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 Gjuvsland, A. B., Vik, J. O., Beard, D. A., Hunter, P. J., & Omholt, S. W. (2013b). 2 Bridging the genotype–phenotype gap: what does it take?. The Journal of 3 physiology, 591(8), 2055-2066. 4 5 Hadfield, J. D. (2008). Estimating evolutionary parameters when viability selection is 6 operating. Proceedings of the Royal Society of London B: Biological 7 Sciences, 275(1635), 723-734. 8 9 Hansen, T.F., and Wagner, G.P. 2001. Modeling genetic architecture: a multilinear theory 10 of gene interaction. Theor. Popul. Biol. 59: 61-86. 11 12 Hansen, T. F. (2008). Macroevolutionary quantitative genetics? A comment on Polly 13 (2008). Evolutionary Biology, 35(3), 182-185. 14 15 Hansen, T. F. (2013). Why epistasis is important for selection and adaptation. Evolution, 16 67(12), 3501-3511. 17 18 Hill, W. G., Goddard, M. E., & Visscher, P. M. (2008). Data and theory point to mainly 19 additive genetic variance for complex traits. PLoS genetics, 4(2). 20 21 Huang W, Mackay TFC (2016) The Genetic Architecture of Quantitative Traits Cannot 22 Be Inferred from Variance Component Analysis. PLoS Genet 12(11): e1006421. 23 24 Jones, A. G., Arnold, S. J., & Burger, R. (2004). Evolution and stability of the G‐matrix 25 on a landscape with a moving optimum. Evolution, 58(8), 1639-1654. 26 27 Merilä, J., Sheldon, B. C., & Kruuk, L. E. B. (2001). Explaining stasis: 28 microevolutionary studies in natural populations. Genetica, 112(1), 199-222. 29 30 Morrissey MB, Kruuk LE, Wilson AJ.The danger of applying the breeder's equation in 31 observational studies of natural populations. J Evol Biol. 2010 Nov;23(11):2277-88. 32 33 Morrissey, M. B. (2015). Evolutionary quantitative genetics of nonlinear developmental 34 systems. Evolution, 69(8), 2050-2066. 35 36 Müller, G. B. (2007). Evo–devo: extending the evolutionary synthesis. Nature reviews 37 genetics, 8(12), 943. 38 39 Müller, G. B. (2017). Why an extended evolutionary synthesis is necessary. Interface 40 focus, 7(5), 20170015. 41 42 Pigliucci, M. (2006). Genetic variance–covariance matrices: a critique of the evolutionary 43 quantitative genetics research program. Biology and Philosophy, 21(1), 1-23. 44 Polly, P. D. (2008). Developmental dynamics and G-matrices: Can morphometric spaces be used to model phenotypic evolution?. Evolutionary biology, 35(2), 83.

29 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 Rice, S. H. (2002). A general population genetic theory for the evolution of 2 developmental interactions. Proceedings of the National Academy of Sciences, 99(24), 3 15518-15523.

4 Rice, S. H. (2004). Developmental associations between traits: covariance and beyond. 5 Genetics, 166(1), 513-526.

6 Rice, S. H. (2012). The place of development in mathematical evolutionary theory. 7 Journal of Experimental Zoology Part B: Molecular and Developmental Evolution, 8 318(6), 480-488.

9 Roff, D. A. (2007). A centennial celebration for quantitative genetics. Evolution: 10 International Journal of Organic Evolution, 61(5), 1017-1032. 11 12 Salazar‐Ciudad, I. (2006). Developmental constraints vs. variational properties: how 13 pattern formation can help to understand evolution and development. Journal of 14 Experimental Zoology Part B: Molecular and Developmental Evolution, 306(2), 107-125. 15 16 Salazar-Ciudad, I., & Marín-Riera, M. (2013). Adaptive dynamics under development- 17 based genotype–phenotype maps. Nature, 497(7449), 361. 18 19 Turelli, M. (1988). Phenotypic evolution, constant covariances, and the maintenance of 20 additive variance. Evolution, 42(6), 1342-1347. 21 22 Turelli, M., & Barton, N. H. (1990). Dynamics of polygenic characters under 23 selection. Theoretical Population Biology, 38(1), 1-57. 24 25 Turelli, M., & Barton, N. H. (1994). Genetic and statistical analyses of strong selection 26 on polygenic traits: what, me normal?. Genetics, 138(3), 913-941. 27 28 T. Uller, A. P. Moczek, R. A. Watson, P. M. Brakefield, K. N. Laland, Developmental bias 29 and evolution: A regulatory network perspective. Genetics (2018). 30 31 Van Dooren, T., Beldade, P., & Allen, C. (2018). Predicting and Analyzing the Response 32 to Selection on Correlated Characters. BioRxiv, 348896. 33 34 Waddington, C.H. (1957) The Strategy of the Genes: A Discussion of Some Aspects of 35 Theoretical Biology. George Allen and Unwin, London. 36 37 Wagner, A. 2011. Genotype networks shed light on evolutionary constraints. Trends Ecol. 38 Evol. 26: 577-584. 39 40 Wilson, A. J., Pemberton, J. M., Pilkington, J. G., Coltman, D. W., Mifsud, D. V., 41 Clutton-Brock, T. H., & Kruuk, L. B. (2006). Environmental coupling of selection and 42 heritability limits evolution. PLoS biology, 4(7), e216.

30 bioRxiv preprint doi: https://doi.org/10.1101/578021; this version posted March 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 2

1 2 Wright S. 1978. Evolution and the genetics of populations, vol. Variability within and 3 among natural populations. Chicago: University of Chicago Press. 4 5 Zuk, O., Hechter, E., Sunyaev, S. R., & Lander, E. S. (2012). The mystery of missing 6 heritability: Genetic interactions create phantom heritability. Proceedings of the National 7 Academy of Sciences, 109(4), 1193-1198. 8 9

31