<<

Copyright © 2011 by the Society of America DOI: 10.1534/genetics.111.127118

Quantitative Through Epigenomic Perturbation of Isogenic Lines

Frank Johannes*,1 and Maria Colomé-Tatché†,2 *Groningen Bioinformatics Centre, University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands and †Institute for Theoretical Physics, Leibniz Universität Hannover, Appelstr. 2, D-30167, Hannover, Germany Manuscript received January 22, 2011 Accepted for publication February 28, 2011

ABSTRACT Interindividual differences in chromatin states at a locus (epialleles) can result in expression changes that are sometimes transmitted across generations. In this way, they can contribute to heritable phenotypic variation in natural and experimental populations independent of DNA sequence. Recent molecular evidence shows that epialleles often display high levels of transgenerational instability. This property gives rise to a dynamic dimension in phenotypic inheritance. To be able to incorporate these non-Mendelian features into quantitative genetic models, it is necessary to study the induction and the transgenerational behavior of epialleles in controlled settings. Here we outline a general experimental approach for achieving this using crosses of epigenomically perturbed isogenic lines in mammalian and plant species. We develop a theoretical description of such crosses and model the relationship between epiallelic instability, recombination, parent- of-origin effects, as well as transgressive segregation and their joint impact on phenotypic variation across generations. In the limiting case of fully stable epialleles our approach reduces to the classical theory of experimental line crosses and thus illustrates a fundamental continuity between genetic and epigenetic inheritance. We consider data from a panel of Arabidopsis epigenetic recombinant inbred lines and explore estimates of the number of quantitative trait loci for plant height that resulted from a manipulation of DNA methylation levels in one of the two isogenic founder strains.

YSTEMATIC or stochastic changes in chromatin epigenetic inheritance in the context of human health, S states, such as gains or losses of DNA or histone , and agriculture have remained largely specu- methylation, are sometimes transmitted across genera- lative ( Johannes et al. 2008; Richards 2008; Bossdorf tions with significant phenotypic effects (Richards et al. 2008; Petronis 2010; Biemont 2010). To overcome 2006). Since different chromatin variants (epialleles) this limitation, it is necessary to obtain a basic inventory can exist in the same sequence background (i.e., on of the transgenerational behavior of epialleles in both the same sequence ), they can produce a dimen- mammals and plants and to formally incorporate these sion of functional variation at the population level that properties into our current models of quantitative in- cannot be captured by an analysis based on DNA se- heritance in natural and experimental populations quence alone ( Johannes et al. 2008). How much of (Johannes et al. 2008). The aim of this article is to this epigenetic variation is routinely missed in linkage outline both experiment and theory to achieve this. or association mapping studies is an open question A powerful experimental approach for studying the (Johannes et al. 2008; Maher 2008; Manolio et al. induction and propagation of epigenetic variation is ichler 2009; E et al. 2010), but preliminary estimates through crosses of epigenomically perturbed isogenic in plants suggest that it can account for up to 30% of strains. In the model plant Arabidopsis two groups have the variation in commonly studied such as recently implemented such an approach by constructing fl ohannes height and owering time ( J et al. 2009). so-called epigenetic recombinant inbred lines (epiRILs) Unlike DNA sequence , epialleles can exhibit (Johannes et al. 2009; Reinders et al. 2009). These pop- akyan a high degree of instability across generations (R ulations were derived from crosses between individuals athieu et al. 2002; M et al. 2007). Because of these with virtually identical DNA sequences but drastically di- dynamic properties the quantitative implications of vergent epigenomic profiles. In both cases, the cross was initiated from a wild-type (wt) plant and a plant carrying Supporting information is available online at http://www.genetics.org/ ohannes cgi/content/full/genetics.111.127118/DC1. a single loss-of-function in ddm1 (J Available freely online through the author-supported open access option. et al. 2009) or met1 (Reinders et al. 2009), two 1Corresponding author: Groningen Bioinformatics Centre, University of involved in DNA methylation control. As a result, mutant Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands. plants exhibit significant global changes in DNA methyl- E-mail: [email protected] ongs okus ister 2Present address: Groningen Bioinformatics Centre, University of Gro- ation (V et al. 1993; C et al. 2008; L et al. ningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands. 2008; Reinders et al. 2009). Although mobilization of

Genetics 188: 215–227 (May 2011) 216 F. Johannes and M. Colomé-Tatché transposable elements also occurs in these epiRILs, they nonetheless provide a unique opportunity to study the transgenerational behavior of induced epigenetic variation against a nearly invariant DNA sequence background (Fig- ure 1A). The experimental setup for constructing such pop- ulations represents a general strategy. Similar approaches could be considered in mammals and/or through the use of environmental triggers to initiate the epigenomic changes in the parental generation (Figure 1B). Molecular profiling of the two Arabidopsis epiRIL populations has shown that only a fraction of induced epialleles remain stable in subsequent generations, the rest being subject to dynamic modifications ( Johannes et al. 2009; Reinders et al. 2009). Two basic patterns are beginning to emerge. The first pattern indicates that a subset of epialleles undergo rapid and stochastic fluc- tuations over a wide spectrum of chromatin states, many of which are outside of the parental range (Reinders Figure 1.—Construction of epigenetic recombinant inbred et al. 2009). While such alterations can be causative of lines (epiRIL): (A) Induction of epigenomic perturbation by means of a mutation in genes involved in chromatin control, phenotypes within a given generation, they probably do fi latkin followed by sel ng (plants) or sibling mating (mammals) of not contribute to phenotypic inheritance (S conditional intercross (F 2jC.C) or backcross (BCjC.C) prog- 2009) and can therefore be regarded as noise in the eny. It is assumed that the parental strains have nearly identi- underlying heritable substrate. The second, and more cal DNA sequences (shaded horizontal bars) but different important pattern, is a systematic and gradual reversion epigenomic profiles (white, gray, and black triangles). The of mutant epiallelic states to those of the wt over the exceptions are rare de novo sequence changes (not shown for simplicity) resulting from compromised chromatin states course of several generations. This process represents in the mutant parent (P 2jc.c). (B) Same crossing scheme as an intrinsic rescue system that is invoked to restore above but using environmental manipulations in place of proper function and integrity. Loci that meet to invoke the initial perturbation. this pattern tend to correspond to sequences that are continuously targeted by the RNA-directed DNA THEORY methylation (RdDM) machinery ( Johannes et al. 2009; Teixeira et al. 2009; Teixera and Colot 2010). Conceptual basis: Consider a locus, L, extensively in- Epiallelic instabilities, as described above, create volved in genome-wide chromatin control. a complex source of heritable variation. These proper- C.C at this locus corresponds to proper chromatin ties pose challenges to the way we have to approach maintenance, whereas the mutant genotype c.c induces quantitative inheritance in the epiRIL or similar pop- global chromatin changes (e.g., modifications of DNA ulations. The key task is to simultaneously account for or histone methylation). We start with two inbred two processes: The first involves the meiotic transmis- parents, a wild-type parent, P1jC.C, and a mutant par- sion of maternal and paternal DNA sequence haplo- ent, P 2jc.c. By design, the two parents have identical types according to Mendelian laws. The second is DNA sequences (except at locus L and inevitably at a dynamic process that governs continuous changes in a small number of other loci; Mirouze et al. 2009; the chromatin states (epialleles) harbored by these Tsukahara et al. 2009), but drastically divergent chro- haplotypes and leads to non-Mendelian patterns of matin profiles (Figure 1A). inheritance. Here we develop the necessary theoretical As a result of the epigenomic perturbation induced foundation to quantify these processes using epiRILs by the mutation, the two parents will differ in their (Figure 1A) as a model system. We find that epiallelic chromatin states at N loci determining a quantitative reversion, recombination, parent-of-origin effects, and trait y. Suppose that in the P1jC.C parent N(1 2 t)of transgressive segregation are key parameters in these these loci have stable epigenotype V.V and Nt of the populations: Their joint effects can produce complex loci have stable epigenotype v.v (Figure 2A). Here epi- and highly dynamic inheritance patterns that cannot allele V corresponds to a phenotypically increasing and be predicted from strictly Mendelian models. In the v to a decreasing chromatin state. Relative to P1jC.C, limiting case of fully stable epialleles our model reduces the mutant parent, P 2jc.c, will have undergone the to the classical theory of experimental line crosses and following possible epiallelic changes at the N loci: ~ thus illustrates a fundamental continuity between ge- V / v, V/v~, v / V, and v/V, where the tilde netic and epigenetic inheritance. In what follows we () signifies an unstable epiallelic state (Figure 2A). present the first comprehensive attempt to quantify epi- We suppose that a proportion s of the newly induced genetic inheritance in model organisms. epialleles remain stable in subsequent generations (V Quantitative Epigenetics in Isogenic Lines 217

Figure 2.—Epigenomic structure of the parental strains and epiallelic re- version: (A) As a result of the perturba- tion the wt (P1jC.C) and the mutant (P2jc.c) parents will differ in their dip- loid chromatin states (epigenotypes) at N loci. The mutant (P2jc.c)parentwill have a proportion t of phenotypically increasing epigenotypes (V.V and ~ ~ V:V) as well as a proportion (1 2 t) of decreasing epigenotypes (v.v and v~:v~), with s and (1 2 s)ofthesebeing either stable or unstable, respectively (see text). (B) Unstable epialleles ~ ðV; v~Þ induced in the mutant parent Ćhave the capacity to revert to wt states over time according to some function, g(t). This reversion can be perfect or imperfect. We quantify epialleles in the parental 5 ~ 5 1 5 ~ 521 generation by setting V V 2 and v v 2 .Startingatt ¼ 0(theF 2jC.C or BC jC.C generation) the state values of reversible epialleles begin to change.

and v) (Figure 2A). This is even the case when proper Transgenerational epigenetic dynamics: We quantify chromatin maintenance function is restored. Since stable the epigenotype at locus j at time t using the coding epialleles behave like DNA sequence changes at the pop- introduced in Table 1. ulation level, we make no formal distinction between Parental generation: Since all individuals are assumed them. Only direct molecular profiling and sequencing homozygous in the parental generation, the total phe- of the two parents and their cross-derivatives will make notypic variance can be expressed as it possible to uncover the physical basis of such stable induced alterations. s2 5 s2 1 d t 2 2; Apart from stable epialleles, we also assume a pro- PðyÞ e ½N ð2 1Þ (1) ~ portion (1 2 s) of newly induced epialleles (V and v~) with the capacity to revert to approximate wild-type states 2 where se is the pooled within-line (environmental) var- over generations. We quantify this physical process iance, and d is henceforth defined as the average con- g through a function (t), which describes the progressive tribution of a single locus to the between-parental changes of epiallelic states in continuous time (Figure ~ phenotypic mean difference D (Serebrovsky 1928): 2B). Specifically, V and v~ correspond to sequences that ohannes are targeted by the RdDM machinery ( J et al. D 2009; Reinders et al. 2009; Teixeira et al. 2009; Teixera d 5 : (2) 2N ð2t 2 1Þ and Colot 2010) or possibly other correction mecha- nism. For the cross design shown in Figure 1A, this d reversion begins after the C.C genotype has been rein- This expression assumes that is equivalent over all d d ... d d troduced at L, that is, in the conditional F2 (F 2jC.C) or N causative loci; that is, 1 ¼ 2 ¼ ¼ N ¼ . backcross (BCjC.C) populations (Figure 2, A and B). Note in Equation 1 that when the transgression pa- t ¼ s2 ð Þ 5 s2 : Upon further propagation of individual lines from these rameter 0.5 we have that P y e In this limiting conditional populations by means of selfing (plants) or case the total phenotypic variance in the parental gen- sibling mating (mammals), progressive epiallelic rever- eration is purely environmental, despite drastic func- sion continues through recurrent maintenance action tional divergence between the parents. This situation at each generation. Including the initial perturbation, provides the condition for a maximum gain in trans- gressive variance in subsequent generations. It follows the different epiallelic fates outlined above can be sum- fi marized schematically as follows: from the fact that each parent is xed for both pheno- typically decreasing and increasing states so that recom- reversion gðtÞ reversion gðtÞ bination events can produce offspring with more ~ ~ v bv ! V bV ! extreme epigenotypes (Riesenberg et al. 1999). We ; v V a stable inheritance a stable inheritance : therefore refer to parameter t as a measure of “trans- v ! v ! V V gression potential” in the parental generation, which can become “realized,” in some sense, in subsequent Our goal is to model this process for any generation generations. This phenomenon is discussed in detail of . This allows us to draw direct connections below. between the basic properties of epialleles and their F1 and base-population: Crossing P1jC.C · P 2jc.c yields impact on heritable variation at the population level. the F 1 generation. We assume that epiallelic states 218 F. Johannes and M. Colomé-Tatché

TABLE 1 Coding of epigenotypes at a locus

Value Epigenotype Additivity Mutant wt dominance V.V 11 11 11 v.v 21 21 21 V.v 0 11 (if V is the mutant) 21 (if V is the mutant) 21 (if v is the mutant) 11 (if v is the mutant) ~ ~ V:V 2g(t)2g(t)2g(t) v~:v~ 22g(t) 22g(t) 22g(t) ~ V:v 1/2 1 g(t)2g(t) 21 V:v~ 1/2 2 g(t) 22g(t) 11 Shown is the coding used for the different epigenotypes at a given locus at time t. We distinguish between additivity, complete dominance due to mutant parental epialleles, and complete dominance due to wt parental epialleles. induced in the P 2jc.c parent remain stable (Teixeira 2 ; ≃ N d2 1 2 2 2 et al. 2009), but this assumption can be easily relaxed s ðh tÞ 4 q1 4g g 1 s 1 3s 1 if necessary. As a consequence, the phenotypic vari- 1 ðN 2 1Þq ½11s 12gð1 2 sÞ2ð1 2 2tÞ2g; ance may or may not (Richards 2009) be equivalent 2 to the environmental variance. As shown in Figure 1A (5) the F1jC.c are used to derive the base population for advanced inbreeding generations either through whereweputg ¼ g(t) to lighten the notation. The · abackcross(F1jC.c P1jC.C) or through an inter- parameters q1 and q2 depend on the mode of ac- cross (F1jC.c · F1jC.c). From these crosses, only the tion (additivity or dominance), the type of inbreed- C.C progeny (BCjC.C or F 2jC.C) are selected to ini- ing scheme (selfing or sibling mating), and the base tiate the inbreeding process through selfing or sibling population (backcross or F2-intercross) used to initi- mating. This permits a detailed study of the time- ate the inbreeding processes. Table 2 provides a sum- fi dependent behavior of parental epialleles indepen- mary of the speci cformofq1 and q2 in each of these dent of the recurrent action of the c.c genotype. For cases. simplicity we ignore the introgression of wt epigeno- An important observation is that dominance (complete types surrounding locus L asaresultofthisselection dominance in our case) appears as a time-dependent procedure. phenomenon, owing not only to the progressive deple- Advanced inbreeding generations: At any time point t of tion of heterozygote epigenotypes, but also to the rever- inbreeding, the variance in trait y is the sum of an epi- sion of mutant epialleles to wt states. It is therefore genetic component, s2(h,t), and an environmental necessary to distinguish two types of dominance effects, (error) component, s2(e): one being attributable to epialleles inherited from the mutant parent (P 2jc.c) and the other being due to epi- alleles deriving from the wt parent (P 2jC.C) (see Tables s2ðy; tÞ 5 s2ðh; tÞ 1 s2ðeÞ: (3) 1 and 2). As we show, this distinction has an effect on We assume that epigenotypes are uncorrelated with the epigenetic variation when inbreeding is carried for- the error term e, and that the error terms are un- ward from a backcross base population, as a result of correlated across generation times. With equal effect the initial asymmetry of the epigenotype frequencies. In sizes across the N causative loci, the epigenetic variance the case of a F2 base population, on the other hand, the can be approximated as epigenetic variance component is equivalent under the two dominance scenarios (what differs is the pheno- typic mean of the population). 2 ; ≃ 2 2 1 2 ; ; ; s ðh tÞ d N fs ðhj j tÞ ðN 1Þsðhj hk j t rÞg (4) Estimation of the number of induced quantitative trait loci (QTL): As an extension of previous biometri- 2 astle erebrovsky where s (hj j t) is the variance at a single locus j at cal approaches (C 1921; S 1928; ; ; ande eng time t and sðhj hkjt rÞ is the between any L 1981; Z 1992), Equation 5 can be used di- two loci j and k separated by an average pairwise re- rectly to obtain conservative estimates of the number of combination fraction r (Franklin 1970). It can be QTL (N ) resulting from the initial epigenomic pertur- shown (see Appendix A) that Equation 4 has the ex- bation in the parental generation. To achieve this, we plicit form substitute d from Equation 2 into the equation for the Quantitative Epigenetics in Isogenic Lines 219

TABLE 2 q q Parameter values 1 and 2

Base Effect Parameter values Selfing 1 F2 Add. q 5 2 1 1 2t11 ! 1 2r 2 1 ð1 2 2rÞt 1 q 5 2 1 2 2r 1 1 2t11

1 1 2 22 2t F2 Dom. q 5 1 22t12 ! 1 2 2r 2 1 4r 3 t12 1 t ð1 2 2rÞ 1 2 2r 1 q 5 112r r 2 1 2 1 2 2 2t11 1 1 2r 1 1 2r 1 1 2r 2t12

1 3 BC Add. q 5 2 1 2t11 4 ! t 2r 2 1 ð1 2 2rÞ 1 2 r 3 q 5 2 2 2r 1 1 2t11 4

1 1 1 2t 1j 2 3 · 22t BC Dom. q 5 ; j 5 1/ wt dominant; j 521/ mutant dominant 1 22t12 h i 2 2 t11 52 1 ð1 4r Þj 1 2 1 2 t 2 ð1 2 2rÞ q2 2t11 2r 1 1 r 1 ð1 2rðr 1ÞÞ 1 1 2r

2 3 2r 2 1 2 1 ; 4 2r 1 1 2t12 j 5 1 / wt dominant; j 521 / mutant dominant

Sib.mat pffiffiffi pffiffiffi 2 t 1 t 1 1 5 3ffiffiffi 1 1 5 3ffiffiffi F2 Add. q1 521 2 p 2 1 1 p 1 1 4 4 5 4 4 5 ! 1   1 ð 2 1Þt 1 1 pffiffiffi pffiffiffi pffiffiffi pffiffiffi F2 Dom. q 5 2 ð3 5 2 7Þð1 2 5Þ2t 2 ð3 5 1 7Þð11 5Þ2t 2 5 1 5 2112t 2314t pffiffiffi pffiffiffi 2 t 1 t 3 1 1 5 2ffiffiffi 1 1 5 2ffiffiffi BC Add. q1 52 2 p 2 1 1 p 1 1 4 4 4 5 4 4 5 h t pffiffiffi pffiffiffi pffiffiffi pffiffiffi 1 ð 2 1Þ 1 t t q 5 1 1 1 jð5 2 2 5Þð1 2 5Þ 1 jð5 1 2 5Þð11 5Þ BC Dom. 1 5 23 2t 22 2t i pffiffiffi pffiffiffi pffiffiffi pffiffiffi 1 1 2 2 2t 1 1 1 2t 2 15 ; 2212t ðð9 4 5Þð1 5Þ ð9 4 5Þð1 5Þ Þ 4 j 5 1 / wt dominant; j 521 / mutant dominant fi The analytical form for the parameters q1 and q2 in Equation 5 is speci ed for the different combinations of mating scheme (selfing or sibling mating), base-population (Base), additivity (Add.), or complete dominance (Dom.) effects (see Appendix A). For sibling mating, q2 can be exactly evaluated numerically for any given value of r. epigenetic variance (Equation 5), and solve for N at additivity, s ¼ 1 (fully stable epialleles), t ¼ 0 (no trans- any generation t of inbreeding to obtain gression), r 5 0:5 (linkage equilibrium among N loci), n o 2 2 2 – D q1½1 2 4gð1 1 gÞðs 2 1Þ 1 3s 1 q2½1 2 2gðs 2 1Þ1s ð1 2 2tÞ and t ¼ 0, Equation 6 reduces to the well-known Castle N 5 n o : ð6Þ 2 2 2 2 astle ð1 2 2tÞ D q2½1 2 2gðs 2 1Þ1s 2 16s ðh; tÞ Wright estimator (C 1921). The most important result of this manuscript is Equa- It is perhaps interesting to note that if we consider tion 5. It formalizes the relationship between epiallelic the restrictive case of a F2 base population with reversion (via g(t) and s), recombination (via r), 220 F. Johannes and M. Colomé-Tatché transgression (via t) and parent-of-origin effects (by (Johannes et al. 2009; Teixeira et al. 2009). Moreover, keeping track of epiallelic origins). This relationship gross perturbations of the epigenome can lead to de novo jointly determines the epigenetic variation in the pop- sequence variation through the insertion of remobilized ulation during inbreeding. In the following section we transposable elements or other structural abnormalities. examine more closely the complex and highly dynamic These induced alterations in the parental generation patterns of heritable variation that can arise from it. also contribute to the fraction of stable segregating var- iation. The precise proportions depend on the particular perturbation, organism, and experimental setup and RESULTS needs to be determined on a case-by-case basis. Classical theory of experimental line crosses typically We explore the effect of different proportions as- assumes no transgression (t ¼ 0) and the transmission suming a fixed reversion function and no transgression, of fully stable parental alleles (s ¼ 1). Sometimes, it is t ¼ 0. Figure 3, A–C (II), illustrates the effect of this further assumed that all loci are in linkage equilibrium type of mixed inheritance with the phenotypic variance ðr 5 0:5Þ: These constraints represent special cases of being decreased over generation time due to reversion the theory developed here. We treat these scenarios as but converging to a stable value as t / N. The final a reference against which to compare the rich spectrum variance represents the stable heritable substrate that of epigenetic inheritance in epigenomically perturbed can be gleaned from the initial perturbation. It should line crosses. The phenotypic variance at any time point be clear that as the proportion of stable epialleles in the is simultaneously determined by all of the parameters genome approaches unity (s / 1), our model converges specified in Equation 5. For clarity we assess their influ- to the familiar case of , which is the ences systematically by varying them one at the time. basis of the classical theory of experimental line crosses. Several key population-level phenomena are consid- Effects of imperfect epigenomic resetting: The re- ered for the case of selfing starting from a backcross version of mutant epialleles to the wt state can be an base population. The case of selfing from a F2 base imperfect process (Figure 2B). It is possible that epial- population can be found in the supporting informa- leles converge to values outside of the parental range tion. Throughout, we fix the average recombination (Reinders et al. 2009). For example, the remethylation rate ðrÞ at 0.44, a value that is based on the Arabidopsis of hypomethylated mutant epialleles could produce genetic map (Lynch and Walsh 1998). hypermethylated states in subsequent generations rela- Inheritance of unstable alleles: We first consider the tive to the wt (Reinders et al. 2009). We consider this case of complete epiallelic instability (s ¼ 0) and no case by letting all unstable epialleles revert to state transgression (t ¼ 0). In this case all induced mutant values that are twice that of the wt parent. This implies epialleles are effectively reverted to the wt state over that the reversion of mutant states must first pass time. The specific form of the reversion function that through wt states before they reach their final stable governs this process is currently unknown, but should state. This process leaves an interesting signature at be of substantial interest in future empirical studies the population level: It leads to an initial depletion of (Johannes et al. 2008). Although it is reasonable to epigenetic variance, before there is an unexpected gain assume that reversion is locus specific, for the purpose in heritable variance at later generations (Figure 3, A–C of providing an average description of the system it (III) light gray solid lines). suffices to consider an average (between-locus) rever- A different pattern occurs in situations where epi- sion function, g(t). We tentatively posit the form g(t) ¼ allelic reversion converges to intermediate parental 1=2 2 2=p arctanðbðt 11ÞÞ, where b is the rate parame- values (0.5 of wt values). In this case, heritable vari- ter (Figure 2B). We vary b so that we can examine the ation is never completely lost (Figure 3, A–C (III), dark full range from slow to fast reversion. In contrast to gray solid lines). Note that this latter pattern mimics stable Mendelian inheritance (Figure 3, A–C (I), black the situation observed for the case of mixed inheritance solid and dashed line), the reversion function has the (previous section). Distinguishing these two possibilities effect of eroding heritable variation over time so that as empirically therefore depends on detailed knowledge of t /N the heritable variation in the population is pro- the average reversion function, g(t), and the proportion gressively lost (Figure 3, A–C (I), dark gray solid lines). of stable epialleles, s, operating in a given population. Mixed inheritance of stable and unstable epialleles: Realized transgression potentials: It has been shown Complete instability of epialleles is an extreme case. It is that transgressive segregation is widespread in experi- more likely that a proportion of the induced epialleles mental line crosses (Riesenberg et al. 1999). A major remain stable and produce Mendelian inheritance reason for this is that the parents have often not un- patterns ( Johannes et al. 2009; Reinders et al. 2009). dergone divergent selection prior to crossing and are Indeed, preliminary molecular profiling of the ddm1-de- therefore each fixed for both increasing and decreasing rived epiRILs suggests that only up to one-half of the . In QTL experiments this is reflected in the tested epialleles are reversible with the remaining loci detection of QTL with opposite signs relative to the being stable for at least eight generations of inbreeding parental means. Quantitative Epigenetics in Isogenic Lines 221

Figure 3.—Transgenerational dynamics of epigenetic variation. We show the dynamic behavior of epigenetic variation in the case of additivity (A), complete dominance of mutant epialleles (B), and complete dominance of wt epialleles (C). Throughout we show the Mendelian inheritance case with linkage (r ¼ 0.44, s ¼ 1 and t ¼ 0, solid black line) and with no linkage (r ¼ 0.5, s ¼ 1 and t ¼ 0, dashed black line). For each mode of action we consider four key phenomena: inheritance of unstable epialleles (A–C, I): s ¼ 0, t ¼ 0, and variable reversion function with rate parameter b evaluated at equal increments over the range 0.1 (top line) to 0.5 (bottom line); x-axis plots generation time and y-axis gives the epigenetic variance, s2(h, t). Mixed inheritance of stable and unstable epialleles (A–C, II): t ¼ 0, fixed reversion function with b ¼ 0.2, and variable s evaluated at equal increments over the range 0 (bottom line) to 0.9 (top line); x-axis and y-axis defined as above. Effects of imperfect epigenomic resetting (A–C, III): Parameter settings were chosen as in (A–C, I) but reversion was assumed to be imperfect with transitions to epiallelic states that are twice the initial wt state (dark gray solid lines) or intermediate between wt and mutant (light gray solid lines); x-axis and y-axis defined as above. Realized transgression potentials (A–C, IV): s ¼ 0.5, fixed reversion function with b ¼ 0.2, and variable t evaluated in equal increments over the range 0 (bottom line) to 0.35 (top line); the x-axis plots generation time and the y-axis 2 ; 2 2 2 21: gives the fold change in epigenetic variance relative to the between-parental variance, s ðh tÞ½sPðyÞ se

In perturbation-derived populations as the ones de- between 0 and 0.35 (t ¼ 0.5 being the maximum). The scribed here, transgressive segregation is likely to be effect for large t is dramatic. Under additivity, and assum- a key aspect of quantitative inheritance. The removal ing t ¼ 0.35, we find that heritable variation increases in of methylation, for instance, can lead to active or inactive the order of fivefold relative to the between parental vari- chromatinstatesinthemutantparent,whichcantranslate ance (Figure 3A, IV). This effect is further exaggerated into increasing or decreasing phenotypic values, depending when mutant epialleles act dominantly. In this case we ob- on whether the underlying sequences are involved in serve a nearly 11-fold increase in heritable variation at early inhibitory or facilitating functionsinthenetworksthat generations (Figure 3B, IV) followed by a gradual decrease connect (epi)genotype with (Figure 2A). due to the progressive loss of hererozygote epigenotypes. We explore the effect of different transgression poten- Our treatment of transgression represents the first tials (t)forafixed reversion function and s ¼ 0.5. We vary t theoretical approach to highlight the importance of 222 F. Johannes and M. Colomé-Tatché transgressive segregation in generating phenotypic in- et al. 2010; Feng et al. 2010), but also show substantial novation in experimental line crosses. interindividual variation within populations (Vaughn Application: A recent analysis of the ddm1-derived et al. 2007; Zhang et al. 2008; Kaminsky et al. 2009). Arabidopsis epiRILs reported large heritable variation The possibility that this type of epigenetic variation for plant height. This work has shown that perturba- can influence phenotypes independent of DNA se- tions of plant methylomes are sufficient to induce last- quence variation poses major challenges to our current ing phenotypic consequences in commonly studied understanding of complex trait inheritance. complex traits. An important question concerns the The first problem is that sequence-based mapping specific features of the heritable architecture that has approaches (i.e., linkage or association mapping) may be been set up in the epiRILs, such as its physical basis insufficient to fully capture the heritable architecture of (e.g., sequence vs. methylation based) or the number complex traits ( Johannes et al. 2008). A recent study by and sizes of induced QTL. Definitive answers to this Kong et al. (2009), for example, illustrates that knowl- question require genome-wide epigenetic profiling tech- edge of the epigenetic status of sequence alleles (in this niques (e.g., ChIP-chip, ChIP-seq, or BS-seq) in combina- case, the parent-of-origin of the alleles) is often neces- tion with QTL mapping methods as well as resequencing sary to establish significant associations with pheno- of each line ( Johannes et al. 2009). While such efforts types. Although the authors found that these effects are underway, we here consider using Equation 6 as the were sex specific, it raises the need to routinely include basis for deriving the first estimates of the number of both levels of variation (DNA sequence or chromatin QTL (and their average effects sizes) underlying plant state) in the analysis. height in this population. The second and partially related problem is that Since phenotypic data from only two time points is the potential temporal instability of epigenetic variation available (parental generation and generation eight of can produce a level of phenotypic dynamics at the inbreeding), it is necessary to make informed assump- population-level that cannot be predicted from strictly tions about several of the unknown parameters speci- Mendelian models of inheritance. Here we have out- fied in Equation 6. These assumptions rely heavily on lined an experimental and theoretical approach to previous molecular or phenotypic observations of this begin to address this issue. The approach relies on population ( Johannes et al. 2009; Teixeira et al. 2009) the use of a perturbation strategy to induce epigenetic and are explicitly listed below: variation in isogenic lines, followed by a transgenerational assessment of derivative populations. This provides an • s2(h, t) ¼ 11.2: estimated from a random-effects ideal platform for studying the temporal properties of model. epialleles and permits a theoretical description of these • D ¼ 9.92: difference in phenotypic means (in centi- properties in connection with the inheritance of com- meters) between the ddm1 and wt parents. plex traits observed in such populations. • s ¼ 0.5: based on an analysis of a random subset of Extensions using environmental triggers: Two exper- loci. imental examples of this approach have recently been • r 5 0:44: average recombination fraction for the Ara- implemented in the model plant Arabidopsis using loss- bidopsis genome (Lynch and Walsh 1998). of-function mutations in ddm1 and met1, two genes in- Values for the transgression potential, t, remain elu- volved in genome-wide methylation maintenance, to sive because this quantity cannot be directly measured initiate the perturbation. While this continues to be using molecular techniques. Keeping this limitation in a valuable resource, future experiments should attempt mind, we show various estimates for the number of QTL to produce similar populations using environmental ma- (N ) for different values of t (Figure 4). They range nipulations in place of mutations (Figure 1B). Bossdorf from as low as 1 (for t ¼ 0) to as high as 6 (for t ¼ et al. (2010), for instance, demonstrated that treatment 0.31). A unique estimate of N can be obtained by fixing t of Arabidopsis ecotypes with demethylation agents is at its theoretical average (see Appendix B). In this case we sufficient to invoke phenotypically relevant methylation find that N 5 2:26 (95% CI ¼ 6 0.31) (Figure 4, black changes. Similarly, Verhoeven et al. (2010) used envi- bar). This (conservative) estimate suggests the induction ronmental stressors, such as chemical induction of her- of a polygenic heritable architecture, with each QTL bivore and pathogen defenses, and noted heritable explaining 14% of the phenotypic variance in plant DNA methylation changes in asexual dandelion. These height. Given these considerably large effect sizes, the un- types of interventions may be sufficient to induce and derlying causative loci should be mappable in future inte- propagate epigenetic variation in the parents and their grative QTL studies, even with relatively small sample sizes. cross derivatives. A demonstration of this should have important implications for evolutionary theory, which traditionally draws a clear divider between the environ- DISCUSSION ment and the heritable material. There is certainly need Epigenetic modifications, such as DNA methylation, for modeling approaches that incorporate environmen- are not only widely conserved across species (Zemach tally induced epigenetic changes into evolutionary Quantitative Epigenetics in Isogenic Lines 223

genomic resetting events on the covariance between relatives. We argue that the construction of mammalian crosses between isogenic parents with perturbed and unperturbed epigenomes will be critically important to begin to extrapolate results to humans. The theory outlined in this article, particularly the results for sibling mating, is entirely compatible with an analysis of experimental mammalian populations (e.g., mice or rats). However, the initial construction of such crosses Figure 4.—Estimates of the number of QTL. Plotted are using perturbations may be more complicated than in estimates of number of QTL (N ) against the transgression plants, given that mutationsingenescontrollingDNA potential parameter t. Parameters were set according to the methylation tend to be lethal (Li et al. 1992). Instead, shade coding shown in the figure. Confidence intervals (95%) were calculated using 1000 nonparametric bootstrap samples one could consider other mutants, partial knockdowns, (Appendix B). The black solid vertical line marks the or any suitable environmental manipulation strategy. An- expected maximum detectable N in a backcross-derived RIL, other complication is to distinguish instances of maternal which is approximately 6 (see Appendix B). or paternal imprinting. One solution would be to set up reciprocal crosses to delineate these effects (i.e., pertur- theory. More general attempts to assess the role of epi- bation of progenitor mother vs. father). genetic inheritance in the context of selection and ad- Conclusion: Our theory attempts to connect recent aptation have been undertaken (Csaba 1998; Pál and observations of the dynamic properties of epialleles Miklós 1999; Bonduriansky and Day 2009). The in- (i.e., DNA methylation variants) to a long tradition of clusion of epiallelic reversion, as we have formalized it quantitative genetics. We have shown that in the case of here, would be an appealing extension. It is tempting to fully stable epialleles, genetic and epigenetic inheri- speculate that such reversion processes have evolved to tance are formally indistinguishable. This illustrates that facilitate short-term adaptation of populations to rapid there is actually no dichotomy between these two modes environmental changes. These issues are outside of the of inheritance. Rather, they should be viewed as differ- scope of this work. ent points on a continuum that ranges from stable to Considerations for mammalian populations: Studies unstable inheritance. We therefore hope that our work of epigenetic inheritance have been primarily pursued will help to bridge the gap between the fields of genet- in plant species, and much less is known about the ics and epigenetics. transgenerational behavior of epialleles in mammalian We thank several anonymous reviewers for detailed suggestions. We populations. The dominant paradigm dictates that the also thank Ritsert C. Jansen and Vincent Colot for helpful comments epigenome is completely reset during early mammalian on earlier versions of this manuscript. F. Johannes was supported by development (Reik 2007; Feng et al. 2010), which a Horizon Breakthrough grant (Netherlands Organisation for Scien- fi implies that induced epigenetic effects are not carried ti c Research (NOW)), and M. Colomé-Tatché acknowledges support from the Centre for Quantum Engineering and Space-Time Research to subsequent generations. Formally, this suggests a re- (QUEST). version process that follows a step function, dropping to wt levels at t ¼ 21(first-generation progeny of two parents). However, several well-documented single-locus LITERATURE CITED examples of epiallelic inheritance exist in mammalian systems (Rakyan et al. 2003; Blewitt et al. 2006; Daxinger Biemont, C., 2010 Inbreeding effects in the epigenetic era. Nat. hitelaw fi Rev. Genet. 11: 234. and W 2010). These ndings are probably no Blewitt, M., N. Vickaryous,A.Paldi,H.Koseki and E. Whitelaw, exception as more recent genome-wide surveys of mouse 2006 Dynamic reprogramming of DNA methylation at an epi- gametes show clear instances of transmitted parental genetically sensitive allele in mice. PLoS Genet. 2: e49. fi orgel Bonduriansky, R., and T. Day, 2009 Nongenetic inheritance and DNA methylation pro les (B et al. 2010), suggest- its evolutionary implications. Annu. Rev. Ecol. Evol. Syst. 40: 103– ing that epigenetic inheritance may be much more 125. widespread in mammalian populations than previously Borgel, J., S. Guibert,Y.Li,H.Chiba,D.Schu¨beler et al., acknowledged. 2010 Targets and dynamics of promoter DNA methylation dur- ing early mouse development. Nat. Genet. 42: 1093–1100. By help of transgenerational phenotypic data, con- Bossdorf, O., C. Richards and M. Pigliucci, 2008 Epigenetics for crete hypotheses about the extent of epigenomic re- ecologists. Ecol. Lett. 11: 106–115. Bossdorf Arcuri Richards Pigliucci setting can be tested on statistical grounds using the , O., D. ,C. and M. , 2010 Experimental alteration of DNA methylation affects the experimental framework outlined in this article. This of ecologically relevant traits in Arabidopsis can be achieved by considering alternative reversion thaliana. Evol. Ecol. 24: 541–553. fi Broman, K., 2005 The of recombinant inbred lines. functions in a t to the data. Such a proposal is akin to – al Genetics 169: 1133 1146. the approach recently developed by T et al. (2010), Bulmer, M. G., 1985 The Mathematical Theory of Quantitative Genetics. which permits hypothesis tests about effect of epi- Clarendon Press, Oxford. 224 F. Johannes and M. Colomé-Tatché

Castle, W., 1921 An improved method of estimating the number Petronis, A., 2010 Epigenetics as a unifying principle in the aetiol- of genetic factors concerned in cases of blending inheritance. ogy of complex traits and diseases. Nature 465: 721–727. Science 54: 223. Rakyan,V.,M.Blewitt,R.Druker,J.Preis and E. Whitelaw, Cokus, S., S. Feng,X.Zhang,Z.Chen,B.Merriman et al., 2002 Metastable epialleles in mammals. Trends Genet. 18(7): 348–351. 2008 Shotgun bisulphite sequencing of the Arabidopsis genome Rakyan, V., S. Chong,M.Champ,P.Cuthbert,H.Morgan et al., reveals DNA methylation patterning. Nature 452: 215–219. 2003 Transgenerational inheritance of epigenetic states at the Csaba, P., 1998 Plasticity, memory and the adaptive landscape of the murine Axin(Fu) allele occurs after maternal and paternal trans- genotype. Proc. R. Soc. Lond. B 265: 1319–1323. mission. Proc. Natl. Acad. Sci. USA 100: 2538–2543. Daxinger, L., and E. Whitelaw, 2010 Transgenerational epige- Reik, W., 2007 Stability and flexibility of epigenetic gene regulation netic inheritance: more questions than answers. Genome Res. in mammalian development. Nature 447: 425–432. 20: 1623–1628. Reinders, J., B. Wulff-Brande,M.Mirouze,A.Mari-Ordonez, Eichler, E., J. Flint,G.Gibson,A.Kong,S.Leal et al., M. Dapp et al., 2009 Compromised stability of DNA methylation 2010 Missing and strategies for finding the underly- and transposon immobilization in mosaic Arabidopsis epige- ing causes of complex disease. Nat. Rev. Genet. 11: 446–450. nomes. Genes Dev. 23: 939–950. Feng, S., S. Cokus,X.Zhang, P.-Y. Chen,M.Bostick et al., Richards, E., 2006 Inherited epigenetic variation–revisiting soft in- 2010 Conservation and divergence of methylation patterning heritance. Nat. Rev. Genet. 7: 395–401. inplantsandanimals.Proc.Nat.Acad.Sci.USA107: 8689– Richards, E., 2008 Population epigenetics. Curr. Opin. Genet. Dev. 8694. 18(2): 221–226. Franklin, I., 1970 Average recombination frequencies. Genetics 66: Richards, E., 2009 Quantitative epigenetics: DNA sequence varia- 709–711. tion need not apply. Genes Dev. 23: 1601–1605. Haldane, J., and C. Waddington, 1931 Inbreeding and linkage. Riesenberg, L., M. Archer and R. Wayne, 1999 Transgressive seg- Genetics 16: 357–374. regation, adaptation and speciation. 83: 363–372. Johannes, F., V. Colot and R. Jansen, 2008 Epigenome dynamics: Serebrovsky, A., 1928 An analysis of the inheritance of quantitative a quantitative genetics perspective. Nat. Rev. Genet. 9: 883–890. transgressive characters. Z. Indukt. Abstammungs. Verbungsl. 48: Johannes, F., E. Porcher,F.Teixeira,V.Saliba-Colombani,M. 229–243. Simon et al., 2009 Assessing the impact of transgenerational Slatkin, M., 2009 Epigenetic inheritance and the missing heritabil- epigenetic variation on complex traits. PLoS Genet. 5: e1000530. ity problem. Genetics 182: 845–850. Kaminsky, Z., T. Tang, S.-C. Wang,C.Ptak,G.Oh et al., 2009 DNA Tal, O., E. Kisdi and E. Jablonka, 2010 Epigenetic contribution to methylation profiles in monozygotic and dizygotic twins. Nat. covariance between relatives. Genetics 184: 1037–1050. Genet. 41: 240–245. Teixera, F., and V. Colot, 2010 Repeat elements and the Arabidop- Kimura, M., 1963 A probability method for treating inbreeding sys- sis DNA methylation landscape. Heredity 105: 14–23. tems, especially with linked genes. Biometrics 19: 1–17. Teixeira, F., F. Heredia,A.Sarazin,F.Roudier,M.Boccara et al., Kong, A., V. Steinthorsdottir,G.Masson,G.Thorleifsson,P. 2009 A role for RNAi in the selective correction of DNA meth- Sulem et al., 2009 Parental origin of sequence variants associ- ylation defects. Science 323: 1600–1604. ated with complex diseases. Nature 462: 868–874. Tsukahara, S., A. Kobayashi,A.Kawabe,O.Mathieu,A.Miura Lande, R., 1981 The minimum number of genes contributing to et al., 2009 Bursts of retrotransposition reproduced in Arabidop- quantitative variation between and within populations. Genetics sis. Nature 461: 423–426. 99: 541–553. Vaughn, M., M. Tanurdzic,Z.Lippman,H.Jiang,R.Carrasquillo Li, E., T. H. Bestor and R. Jaenisch, 1992 Targeted mutation of et al., 2007 Epigenetic natural variation in Arabidopsis thaliana. the DNA methyltransferase gene results in embryonic lethality. PLoS Biol. 5: e174. Cell 69: 915–926. Verhoeven,K.,J.Jansen,P.van Dijk and A. Biere, 2010 Stress- Lister, R., R. O’Malley,J.Tonti-Filippini,B.Gregory,C.Berry induced DNA methylation changes and their heritability in et al., 2008 Highly integrated single-base resolution maps of the asexual dandelion. New Phytol. 185: 10.1111/j.1469–8137. epigenome in Arabidopsis. Cell 133(3): 523–536. 2009.03121.x. Liu, B. H., 1998 Statistical . CRC Press, Boca Raton, FL. Vongs, A., T. Kakutani,R.Martienssen and E. Richards, Lynch, M., and B. Walsh, 1998 Genetics and the Analysis of Quantita- 1993 Arabidopsis thaliana DNA methylation mutants. Science tive Traits. Sinauer Associates, Sunderland, MA. 260: 1926–1928. Maher, B., 2008 Personal genomes: the case of the missing herita- Wright, S., 1933 Inbreeding and homozygosis. Proc. Natl. Acad. bility. Nature 456: 18–21. Sci. USA 19: 411–420. Manolio, T., F. Collins,N.Cox,D.Goldstein,L.Hindorff et al., Zemach, A., I. McDaniel,P.Silva and D. Zilberman, 2009 Finding the missing heritability of complex diseases. Na- 2010 Genome-wide evolutionary analysis of eurkaryotic DNA ture 461: 747–753. methylation. Science 328: 916–919. Mathieu,O.,J.Reinders,M.Caikovski,C.Smathajitt and Zeng, Z.-B., 1992 Correcting the bias of Wright’s estimates of the J. Paszkowski, 2007 Transgenerational stability of the Arabi- number of genes affecting a quantitative character: a further im- dopsis epigenome is coordinated by CG methylation. Cell 130 provement. Genetics 131: 987–1001. (5): 851–862. Zhang, X., S. Shiu,A.Cal and J. Borevitz, 2008 Global analysis of Mirouze,M.,J.Reinders,E.Bucher,T.Nishimura,K.Schneeberger genetic and epigenetic and transcriptional polymorphisms in Ara- et al., 2009 Selective epigenetic control of retrotransposition in bidopsis thaliana using whole genome tiling arrays. PLoS Genet. 4: Arabidopsis. Nature 461: 427–430. e1000032. Pa´l, C., and I. Miklós, 1999 Epigenetic inheritance, genetic assim- ilation and speciation. J. Theor. Biol. 200: 19–37. Communicating editor: D. W. Threadgill Quantitative Epigenetics in Isogenic Lines 225

APPENDIX A ing the following basic symmetries (supporting material V:v~ 5 v~:V; V:v~ · V:V 5 This appendix shows the derivation of the epigenetic in File S1, 1.4.1): and V:V · V:v~: b variance in the population, Equation 5. For the details The transition matrix T is the collection on the values of the parameters presented, we refer the of the probabilities of going from one mating type to another in one generation of sibling mating; it has di- reader to the supporting information. The epigenetic · variance can be written as mension 6 6 (supporting material in File S1, 1.4.3) (for n o a detailed description on how to construct such a matrix, see Bulmer 1985, Chap. 3). The initial probabilities of s2ðh; tÞ ≃ d2N s2ðh j tÞ 1 ðN 2 1Þsðh ; h jt; rÞ ; j j k each mating class, Qð0Þ; is a 6-dimensional vector given by Q (0) ¼ K (0)K (0), where i and j are the single- s2 h j (l) (i) (j) where ( j t) is the variance at a single locus j at time t locus epigenotypes involved in mating type l. At any gen- sðh ; h j ; Þ and j k t r is the covariance between any two loci eration t, QðtÞ 5 Qð0ÞTb t : The probabilities for the 4 j and k separated by an average pairwise recombination differentP single-locus epigenotypes are given by fraction r (Franklin 1970). 5 6 ; KðiÞðtÞ l 51pil Q ðlÞðtÞ where pil is the proportion of Variance at a single locus: The 12 different epigeno- single-locus i involved in mating type l (supporting mate- types at locus j resulting from the initial cross (Figure rial in File S1, 1.4.2). Using Equation 7 the variance at 2A) can be classified into four different classes of three ... a single locus is given by elements each. Denote these four classes by s1, , s4 É with corresponding probability weights W , ..., W 2 ; 5 1 1 2 2 2 : 1 4 s hj t q1 4g t g t 1 s 1 3s 1 (supporting material in File S1, 1.1). Since each locus 4 can belong to any one of these classes, its variance can (9) be written as the sum of the within-class variances The value of the parameter q depends on the case weighted by their probabilities: 1 considered (Table A2). X4 Covariance between loci: To calculate the covariance 2 ; 5 2 ; s hj t Wm s hj j sm t 5 term between loci j and k separated by an average re- m 1 ! ! fi X4 X4 X4 2 combination factor r we use a similar classi cation for 5 W KðiÞ t hðiÞðtÞ2 2 KðiÞðtÞhðiÞðtÞ ; m sm sm the possible 160 different two-locus epigenotypes result- m51 i51 i51 ing from the initial cross (Figure 2A). They can be ð7Þ assigned to 16 different classes of 10 pairs each, ... where KðtÞ is a vector of expected single-locus epigeno- d1, , d16, with probability Vi(hj, hk) (supporting ma- type probabilities and hðtÞ is a vector of expected epi- terial in File S1, 2.1). Since each two-locus epigenotype genotypes at locus j at time t (supporting material in can belong to any one of these 16 classes we can write File S1, 1.2). The probability vector KðtÞ is obtained the covariance as: P16 following a Markov chain approach. Below we specify ; ; ≃ ; ; ; ; sðhj hk j t rÞ Vm ðhj hk Þsðhj hk j dm t rÞ 5 the cases for selfing and sibling mating. m 1  P16 P10 fi 5 ðiÞ ðiÞ Sel ng: Kðt 0Þ is a three-dimensional vector with the 5 V ðh ; h Þ R ðiÞðtÞh ðtÞh ðtÞ m j k dm;1 dm;2 probabilities of each single-locus epigenotype, deter- m51 i51 ! !# X10 X10 mined by the type of base population (F2 or BC) (sup- 2 RðiÞðtÞhðiÞ ðtÞ RðiÞðtÞhðiÞ ðtÞ ; dm;1 dm;2 porting material in File S1, 1.3.1). The transition matrix i51 i51 b T for each class is a 3 · 3 matrix of transition probabil- ð10Þ ities (supporting material in File S1, 1.3.2). Using a Mar- kov chain approach we calculate the frequency of each where RðtÞ is a 10-dimensional vector of two-locus epi- epigenotype at time t using KðtÞ 5 Kð0ÞTb t : Using Equa- genotype probabilities (i.e., parental haplotypes). Here, h and h are the two different expected epigeno- tion 7 the variance at a single locus is given by dm;1 dm;2 types in a given class dm (supporting material in File S1, É 2.2). The probability vector RðtÞ is obtained following 2 ; 5 1 1 2 2 2 : s hj t q1 4g t g t 1 s 1 3s 1 a Markov chain approach; below we specify the cases 4 fi fi (8) for sel ng and sibling mating. In the simpli ed case of purely Mendelian inheritance (s ¼ 1), no transgres- N The value of the parameter q1 differs for the type of sion (t ¼ 0), and t ¼ ,thederivationofRðtNÞ has base population considered (F2 or BC) and for the type received considerable attention as a problem in its of epigenotypic effect considered (additivity or domi- own right (Haldane and Waddington 1931; Wright nance); see Table A1. 1933; Kimura 1963; Broman 2005). Sibling mating: Consider, instead of the probabilities Selfing: For each of the 16 classes Rðt 5 0Þ is calcu- of each single-locus, the probabilities of their mating lated depending on the type of base population consid- types (Bulmer 1985). There are 16 possible mating ered (F2 or BC) (supporting material in File S1,2.3.1). types in each class, which can be reduced to 6 consider- The transition probability matrix Tb is a 10 · 10 matrix of 226 F. Johannes and M. Colomé-Tatché

TABLE A1 Unfortunately, the transition matrix Tb cannot be di- agonalized symbolically because its dimension is too Base Effect q 1 large (see Figure S2). For this reason we cannot obtain 1 an analytical expression (as a function of the parameter F2 Add. 2 1 2t11 r)fortheprobabilitiesRðtÞ: However, we can fix r to b 1 a numerical value before calculating the power T t ; and 1 2 22 2t F2 Dom. thus obtain the exact probability values at any time 22t12 t. Moreover, it is possible to write symbolically the proba- 1 2 3 bility vector as RðtÞ 5 fR ð1ÞðtÞ; R ð2ÞðtÞ; ...; R ð10ÞðtÞg and, BC Add. 1 P 2t 1 4 10 5 ; using Equation 10 and taking into account i 51RðiÞ 1 1 1 1 2t 1j 2 3 · 22t we can write the covariance for sibling mating in the form BC Dom. ; 22t12 n o 1 j 5 1 / wt dominant; s h ; h j t; r ≃ q ½11s 12gð1 2 sÞ2ð1 2 2tÞ2 ; (12) j k 4 2 j 521 / mutant dominant where the constant q2 is calculated exactly for any value of r for a F2- or a BC-based population. transition probabilities of each two-locus epigenotype crossed with itself (supporting material in File S1, Epigenetic variance: 2.3.2), and RðtÞ 5 Rð0ÞTb t : Using Equation 10 we obtain Finally, combining Equations 8 and 9 with Equations n o 1 2 2 11 and 12, and multiplying by the number of loci and s h ; h jt; r ≃ q ½11s 12gð1 2 sÞ ð1 2 2tÞ : (11) 2 j k 4 2 their mean phenotypic effect, Nd , yields Equation 5 in the main text.

The analytical value of the parameter q2 is shown in Table A3. APPENDIX B Sibling mating: Consider the 55 different mating In this appendix we show how we estimate the types between the 10 two-locus epigenotypes in each number of QTL in the ddm1-derived epiRILs. Consider class (supporting material in File S1, 2.4.1) (Bulmer the equation for N in the main text (Equation 6) at 1985). Taking into account the symmetries mentioned t ¼ N (i.e., fully inbred), under the assumption of per- above we can reduce them to 22 different mating types fect resetting (g(t ¼N) ¼ 21): for the F2- generated population and to 34 for the BC 2 one. In the same way as for the single-locus case, Qð0Þ 3D2sðsð1 2 2tÞ2 1 ð2r 1 1Þ=ð2r 2 1ÞÞ N 5 are the initial probabilities of each mating class, ð3D2s2 1 16s2ðh;tÞð2r 1 1Þ=ð2r 2 1ÞÞð1 2 2tÞ2 b t QðtÞ 5 Qð0ÞT ; and the probabilities for the 10 two- 5 f ðC; tÞ; locus epigenotypes, RðtÞ; can be extracted from the mating type probabilities QðtÞ (supporting material in where C is a vector of all the parameters specified on File S1, 2.4.2). the right-hand side of the equation. Substituting the

TABLE A2

Base Effect q1 pffiffiffi pffiffiffi 1 1 2 5 t 3 1 11 5 t 3 F2 Add. 2 1 2 pffiffiffi 2 1 1 pffiffiffi 1 1 4 4 5 4 4 5 ! 1   1 ð 2 1Þt 1 1 pffiffiffi pffiffiffi pffiffiffi pffiffiffi F2 Dom. 2 ð3 5 2 7Þð1 2 5Þ2t 2 ð3 5 1 7Þð11 5Þ2t 2 5 5 2112t 2314t pffiffiffi pffiffiffi 3 1 1 2 5 t 2 1 11 5 t 2 BC Add. 2 2 pffiffiffi 2 1 1 pffiffiffi 1 1 4 4 4 5 4 4 5 h 1 ð 2 1Þt 1 pffiffiffi pffiffiffi pffiffiffi pffiffiffi BC Dom. 2 j 2 2 t 1 j 1 1 t 312t 212t ð5 2 5Þð1 5Þ ð5 2 5Þð1 5Þ 5 2 2  1 pffiffiffi pffiffiffi pffiffiffi pffiffiffi 15 1 ðð9 2 4 5Þð1 2 5Þ2t 1 ð9 1 4 5Þð11 5Þ2t Þ 2 ; 2212t 4 j 5 1 / wt dominant; j 521 / mutant dominant Quantitative Epigenetics in Isogenic Lines 227

TABLE A3

Base Effect q2 ! 1 2r 2 1 ð1 2 2rÞt 1 F2 Add. 2 1 2r 1 1 2t11 ! 1 2 2r 2 1 4r 3 t12 1 t ð1 2 2rÞ 1 2 2r 1 F2 Dom. 112rðr 2 1Þ 2 1 2 2t11 1 1 2r 1 1 2r 1 1 2r 2t12 ! 2r 2 1 ð1 2 2rÞt ð1 2 rÞ 3 BC Add. 2 2r 1 1 2t11 4 h i 2 2 t11 BC Dom. 2 1 jð1 4r Þ 1 2 1 2 t 2 ð1 2 2rÞ 2 3 2r 2 1 2 1 ; 2t11 2r 1 1 ðr 1Þ ð1 2rðr 1ÞÞ 1 1 2r 4 2r 1 1 2t12

j 5 1 / wt dominant; j 521 / mutant dominant values for D, s2(h, t), s, and r provided in the main text ber of QTL occurs in a situation where each generated yields one equation and two unknowns (N and t). With segment is occupied by a QTL. This expectation can be heritability data from only one generation, it is not pos- approximated by sible to find unique solutions, and at least one additional generation of phenotypic measurements is required. XC Rj Obtaining N : In the absence of such additional data EðNmaxÞ ≃ 2 ; 1 2 R one strategy is to calculate an average N by integrating j51 j over the theoretical range of t, Ð where C is the total number of , and the uf ðC; tÞdt ratio on the right-hand side is the odds ratio of a re- N 5 0 Ð ; udt combination vs. no recombination breakpoint on chro- 0 mosome j. As a rule of thumb, it is safe to say that ≃ ≃ 6 : where u ¼ tj(N ¼ Nmax) is the upper integration limit. E(NmaxjF 2) 2C and EðNmaxjBCÞ 5C These latter To find a value for u we solve Equation 5 for t and expressions assume linkage equilibrium between the evaluate it at the expected maximum number of QTL, beginning and end of j, that is rj ¼ 0.5. Bootstrap standard errors: We obtain standard errors Nmax, which can be detected given the particular mat- ing scheme. for N using a nonparametric bootstrap approach. To N Obtaining the expected value for max: In a popu- achieve this we take the following steps: lation of RILs, let Rj denote the probability of a recom- 1. Recalculate D on the basis of a random sample of size binant type over theP entire length of the jth n from each of the two parental phenotypic vectors. 5 ðiÞ 5 N ; (i) chromosome, Rj i2R R ðt Þ where R (t) 2. Draw a random, stratified bootstrap sample from the fi are the components of RðtÞ (Appendix A) at xation epiRILs phenotypic vector, and approximate the epi- and R is the ensemble of recombinant two-locus epi- genetic variance, s2(h, t), using a random intercepts genotypes. Using our Markov chain approach for self- 1 1 e model: yi,j ¼ b0 bizi,j i,j, where b0 is a common ing, the value of Rj is given by fi xed intercept, bi is the random intercept of the ith e 2r 3r line, zi,j is an index variable, and i, j is the error. We 5 j ; 5 j ; s2 h N e s2 Rj jF 2 1 Rj j BC 1 (13) assume that bi N(0, ( , t ¼ )), i, j N(0, i), 1 2rj 2 4rj e e and Cov( i, j, i, j9) ¼ 0. 3. Use the estimates for 1 and 2, and determine the where r is the recombination fraction at meiosis be- j upper integration limit (u)fort. Ð Ð tween the beginning and the end of chromosome j.Note u u 4. Determine N by calculating N 5 f ðC; tÞdt= dt: that the result for R jF 2isconsistentwithHaldane and 0 0 j 5. Repeat steps 1–4 a large number of times. The stan- Waddington (1931). Given a known genetic map, r j dard deviation of the resulting bootstrap distribution can be calculated using any map function, as long as is an approximation for the standard error of N : its inverse is available (Liu 1998). The probability of a recombinant type implies at least one recombination Note that these sampling errors will be slightly under- breakpoint in the interval, thus generating two potential estimated, because the values for s and g(t) are assumed QTL segments flanking the breakpoint. Assuming s ¼ 1 known from molecular analysis, and the variation in (all epialleles are stable), the expected maximum num- Nmax is also neglected for simplicity.