<<

Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1278

Complex Trait

Beyond Additivity

SIMON FORSBERG

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6206 ISBN 978-91-554-9754-5 UPPSALA urn:nbn:se:uu:diva-307837 2016 Dissertation presented at Uppsala University to be publicly examined in B22, BMC, Husarg. 3, Uppsala, Friday, 13 January 2017 at 10:15 for the degree of Doctor of Philosophy (Faculty of Medicine). The examination will be conducted in English. Faculty examiner: Assistant Professor Julien Ayroles (Princeton University).

Abstract Forsberg, S. 2016. Complex Trait Genetics. Beyond Additivity. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1278. 45 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9754-5.

The link between the and the of an organism is immensely complex. Despite this it can, to a great extent, be captured using models that assume that gene variants combine their effects in an additive manner. This thesis explores aspects of genetics that cannot be fully captured using such additive models. Using experimental data from three different model organisms, I study two phenomena that fall outside of the additive paradigm: genetic interactions and genetic variance heterogeneity. Using the model plant Arabidopsis thaliana, we show how important biological insights can be reached by exploring loci that display genetic variance heterogeneity. In the first study, this approach identified alleles in the gene CMT2 associated with the climate at sampling locations, suggesting a role in climate adaption. These alleles affected the genome wide methylation pattern, and a complete knock down of this gene increased the plants heat tolerance. In the second study, we demonstrate how the observed genetic variance heterogeneity was the result of the partial linkage of many functional alleles near the gene MOT1, all contributing to Molybdenum levels in the leaves. Further, we explore genetic interactions using data from dogs and budding yeast (Saccharomyces cerevisiae). In the dog population, two interacting loci were associated with fructosamine levels, a biomarker used to monitor blood glucose. One of the loci displayed the pattern of a selective sweep in some of the studied breeds, suggesting that the interaction is important for the phenotypic breed-differences. In a cross between two strains of yeast, with the advantage of large population size and nearly equal allele frequencies, we identified large epistatic networks. The networks were largely centered on a number of hub-loci and altogether involved hundreds of genetic interactions. Most network hubs had the ability to either suppress or uncover the phenotypic effects of other loci. Many multi-locus allele combinations resulted in that deviated significantly from the expectations, had the loci acted in an additive manner. Critically, this thesis demonstrates that non-additive genetic mechanisms often need to be considered in order to fully understand the genetics of complex traits.

Keywords: genetic interactions, epistasis, additivity, GWAS, vGWAS, Genetic mapping, yeast, Arabidopsis Thaliana, dog

Simon Forsberg, Department of Medical Biochemistry and Microbiology, Box 582, Uppsala University, SE-75123 Uppsala, Sweden.

© Simon Forsberg 2016

ISSN 1651-6206 ISBN 978-91-554-9754-5 urn:nbn:se:uu:diva-307837 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-307837)

I am the family face; Flesh perishes, I live on, Projecting trait and trace Through time to times anon, And leaping from place to place Over oblivion.

The years-heired feature that can In curve and voice and eye Despise the human span Of durance -- that is I; The eternal thing in man, That heeds no call to die

Thomas Hardy - Heredity

Cover image: Illustration of allele combinations, the space of genotypic possibilities, at 2 to 5 loci. From Sewall Wrights 1932 paper The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Reproduced with permission from GENETICS SOCIETY OF AMERICA

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I. Shen X, De Jonge J, Forsberg SKG, Pettersson ME, Sheng Z, Hen- nig L, & Carlborg Ö. (2014) Natural CMT2 Variation Is Associated With Genome-Wide Methylation Changes and Temperature Season- ality. PLOS Genet. 10 1–14. II. Forsberg SKG, Kierczak M, Ljungvall I, Merveille A-C, Gouni V, Wiberg M, Lundgren Willesen J, Hanås S, Lequarré A-S, Mejer Sørensen L, Tiret L, McEntee K, Seppälä E, Koch J, Battaille G, Lohi H, Fredholm M, Chetboul V, Häggström J, Carlborg Ö, Lind- blad-Toh K, & Höglund K. (2015) The Shepherds’ Tale: A Genome- Wide Study across 9 Dog Breeds Implicates Two Loci in the Regu- lation of Fructosamine Serum Concentration in Belgian Shepherds. PLoS One 10, 1–17. III. Forsberg SKG, Andreatta ME, Huang X-Y, Danku J, Salt DE, & Carlborg Ö. (2015) The Multi-allelic of a Vari- ance-Heterogeneity Locus for Molybdenum Concentration in Leaves Acts as a Source of Unexplained Additive Genetic Variance. PLOS Genet. 11, 1–24. IV. Forsberg SKG, Bloom JS, Sadhu M, Kruglyak L, & Carlborg Ö, Accounting for genetic interactions improves modeling of individual quantitative trait phenotypes in yeast. bioRxiv (2016). doi:10.1101/059485. Manuscript in review.

Reprints were made with permission from the respective publishers.

List of Papers (not included in the thesis)

1. Saha I, Zubek J, Klingström T, Forsberg S, Wikander J, Kierczak M, Maulik U, & Plewczynski D. (2014) Ensemble learning predic- tion of protein-protein interactions using proteins functional annota- tions. Mol. Biosyst. 10, 820–30. 2. Sjöstrand K, Wess G, Ljungvall I, Häggström J, Merveille A-C, Wi- berg M, Gouni V, Lundgren Willesen J, Hanås S, Lequarré A.-S, Mejer Sørensen L, Wolf J, Tiret L, Kierczak M, Forsberg S, McEntee K, Battaille G, Seppälä E, Lindblad-Toh K, Georges M, Lohi H, Chetboul V, Fredholm M, & Höglund K, (2014) Breed Dif- ferences in Natriuretic Peptides in Healthy Dogs. J. Vet. Intern. Med. 28, 451–457. 3. Kierczak M, Jabłońska J, Forsberg SKG, Bianchi M, Tengvall K, Pettersson M, Scholz V, Meadows JRS, Jern P, Carlborg Ö, & Lind- blad-Toh K. (2015) cgmisc: enhanced genome-wide association analyses and visualization. Bioinformatics 31, 3830. 4. Höglund K, Lequarré A-S, Ljungvall I, Mc Entee K, Merveille A-C, Wiberg M, Gouni V, Lundgren Willesen J, Hanås S, Wess G, Mejer Sørensen L, Tiret L, Kierczak M, Forsberg SKG, Seppälä E, Lind- blad-Toh K, Lohi H, Chetboul V, Fredholm M, & Häggström J. (2016) Effect of Breed on Plasma Endothelin-1 Concentration, Plasma Renin Activity, and Serum Cortisol Concentration in Healthy Dogs. J. Vet. Intern. Med. 30, 566–573. 5. Zan Y, Shen X, Forsberg SKG, & Carlborg Ö. (2016) Genetic Regulation of Transcriptional Variation in Natural Arabidopsis tha- liana Accessions. G3 Genes|Genomes|Genetics 6, 2319–2328.

Contents

Introduction ...... 11 Historical Background ...... 12 Genetics Terminology ...... 13 Non-additivity ...... 13 1. Epistasis ...... 14 2. Genetic variance heterogeneity ...... 20 Present investigations ...... 24 Paper I ...... 24 Paper II ...... 27 Paper III ...... 28 Paper IV ...... 30 Summary and future perspectives ...... 34 Supplements ...... 38 Statistical Models ...... 38 Acknowledgements ...... 40 References ...... 42

Abbreviations

bp basepair CMT2 chromomethylase 2 COPT6 copper transporter 6 DNA deoxyribonucleic acid GAPDH glyceraldehyde-3-phosphate dehydrogenase GWAS genome wide association study kb kilobase LD linkage disequilibrium LETM1 leucine zipper and EF-hand containing trans- membrane protein 1 MC1R melanocortin 1 receptor MOT1 molybdate transporter 1 QTL SNP single nucleotide polymorphism TE transposable element vGWAS variance heterogeneity GWAS vQTL variance heterogeneity QTL

Introduction

Man has probably always had some conception of inheritance. By looking around us, we realize that offspring inherit characteristics from their parents. The field of genetics involves the formal study of inheritance. As far as we know, all organisms carry a genome containing the information necessary to produce said organism. The overall goal of genetics is to understand the link between genotype, the genetic makeup of a certain individual, and pheno- type, the traits displayed by this individual. What are the underlying genetic factors that give rise to individual phenotypes? What is it that makes one individual tall and another short, one sick and another healthy? What are the mechanisms? These are a few of the questions that genetics tries to answer. Although recent technical advances has allowed us to “read” genomes on an unprecedented scale, I would argue that our understanding of these funda- mental questions are still, at best, rudimentary. The potential applications of genetic knowledge are abundant. For in- stance, identifying the genetic factors responsible for susceptibility to human diseases is a crucial step in improving diagnosis, prevention and treatment. Our understanding of fundamental genetics also sets the limit for endeavors such as genetic engineering. This thesis is but a small attempt to push our understanding a bit further. In particular, it examines the assumption of addi- tivity – that the joint effect of a set of loci can be obtained by summing up the effects of its parts – typically made by quantitative geneticists when try- ing to find the genetic basis of complex traits. As we explore in these four papers, non-additive genetic mechanisms can have profound implications for our ability to find the genetic factors underlying individual phenotypes, as well as our ability to predict these phenotypes once we know the underlying factors. We also examine the potential implications of non-additive mecha- nisms for how populations respond to natural and/or artificial selection. We find several examples of non-additive genetic mechanisms and suggest that they need to be considered for the field of genetics to better understand the basis of complex traits.

11 Historical Background The field of genetics is usually said to begin with , and his famous experiments with peas in the 1860s1. He worked with discrete pheno- types, presence or absence of wrinkles, and discovered the today well-known patterns of dominant and recessive inheritance. Around the same time, one of the most famous works in the history of science, the Origin of Species2, was published by Charles Darwin. In this work, Darwin presents the theory of evolution through natural selection, laying the basis for all of modern biology. Evolution requires that traits are inherited from one generation to the next, but Darwin, being unaware of Mendel’s work, did not know the mechanisms of this process. It was not until the work of Mendel was redis- covered in the beginning of the 20th century that genetics could be incorpo- rated into the evolutionary theory. At first, however, this presented an appar- ent paradox. Inheritance, according to the laws of Mendel, is discrete. But according to Darwin, evolution progresses in a continuous manner and not through discrete jumps. Also, most of the natural variation in phenotypes that we observe in nature, such as height in humans, is continuous. How can a discrete mechanism of inheritance give rise to continuous phenotypes? One scientist that addressed the inheritance of continuous traits was Francis Galton. In the wake of Darwins discoveries, Galton meticulously studied various traits in human populations, and also created many of the fundamental tools in statistics, such as regression and the concept of correlation. By comparing indi- viduals with various degrees of relatedness, he developed the notion of heritabil- ity and founded the field of biometrics3,4. The biometrical approach was eventu- ally reconciled with Mendels laws of inheritance through the theoretical ad- vancements of Ronald Fisher5. Fisher, a brilliant geneticist and statistician, pro- vided a conceptual framework of how to think about inheritance that made it possible to apply concepts from Mendelian genetics to continuous, in genetics usually referred to as quantitative, phenotypes. The essence of the model that Fisher proposed is that a large number of Mendelian factors (alleles), each mak- ing a small contribution, together produce the phenotype (the so called infinites- imal model)5. The model is additive, meaning that the combined effect of all alleles is produced by simply summing up their individual effects. Fisher also added the additional parameters dominance and epistacy, which where mainly statistically motivated nuisance parameters, included to account for anomalies in the model. If we for instance consider a quantitative phenotype such as human height, the assumes that the height of any individual is the result of all alleles carried by that individual. Each allele contributes a very small amount to the phenotype and if we add all their effects together, we get the height of that person. It is worth mentioning that at the time of Fisher, knowledge about the mo- lecular mechanisms of inheritance was very limited. It was not until the 1950s that DNA became known as the molecule that carries information from one

12 generation to the next6, and not until recently that we have the technical means of actually reading the DNA7,8. When Fisher formulated his model, concepts such as gene, locus and allele, where still abstract notions. Only later did they come to have physical meanings, related to the DNA molecule. It has also been obvious to geneticists for many years that Fishers basic assumptions are simplifications. Still, much of todays genetic research, such as genome wide association studies (GWAS), relies on additive analysis models.

Genetics Terminology As established above, heritable traits are, to various degrees, determined by information contained in the genome. I will here start by introducing the basic genetic terminology that will be used throughout this thesis. A certain location in the genome is referred to as a locus (plural: loci). In genetics, we typically as- sume that every locus can display two alternative genetic variants. If for instance the locus in question is a Single Nucleotide Polymorphism (SNP), a given indi- vidual can carry either nucleotide A1 or nucleotide A2 at this position. These alternative genetic variants are referred to as alleles. The term allele might also refer to more complex genetic variants than SNPs, such as presence/absence of larger chromosomal segments. In diploid organisms such as humans, every in- dividual carries two copies of every chromosome, one inherited from its mother and the other from its father. This means that at every locus, there are three pos- sible allele combinations: A1A1, A1A2, and A2A2. These allele combinations are referred to as . In a haploid organism such as yeast, this is not the case, as they only have one copy of each chromosome. A highly inbred organism, such as Arabidopsis thaliana, might also for most purposes be regarded as hap- loid, since the genotype A1A2 almost never occurs. In this thesis, I also use the term genotype to refer to allele combinations across two or more loci. For ex- ample, A1A2B1B1 is a two locus genotype where the individual carries the alleles A1 and A2 at locus A, and two B1 alleles at locus B. A trait, for which we are trying to find the genetic determinants, is typically referred to as a phenotype. In this thesis I use the two terms trait and phenotype interchangeably.

Non-additivity As described above, the prevailing paradigm in quantitative genetics since the time of Fisher has been that of additivity. This thesis deals with two forms of non-additive genetic mechanisms: epistasis and genetic variance heterogeneity. These two phenomena are often, but not always, related. I will introduce them in turn, also providing some general and conceptual remarks regarding statistical modeling.

13 1. Epistasis For continuous phenotypes, there are typically a large number of contrib- uting alleles located at different loci. A central question when trying to un- derstand the genetic mechanism of a certain phenotype is then: do the differ- ent alleles act independently of each other? If we imagine a hypothetical allele A, does it always have the same effect on the phenotype, or does this effect vary depending the presence or absence of certain other alleles? Such dependencies between alleles at different loci are called genetic interactions, or epistasis, and they violate the assumption of additivity (Figure 1). Throughout this thesis, I use the terms epistasis and genetic interaction inter- changeably. Although most geneticists agree that epistasis exist, they do not agree on how common and influential the phenomenon is, and whether it needs to be considered in genetic studies. The importance of epistasis has been debated for almost 100 years, and it is still the source of much contro- versy within the field of genetics9–11.

II

Ii Dominant white genotype (KIT)

ii

EE Ee ee Extension genotype (MC1R)

Figure 1. Epistasis. The figure shows an example of the classical form of epistasis, as Bateson originally described the phenomenon. The I allele at the KIT locus, which confers white coat color, is dominant over all genotypes at the MC1R locus. The I allele thereby masks the effect of the MC1R locus, which only displays a phe- notypic effect in individuals with the recessive genotype ii at the KIT locus. (Inspired by Carlborg and Haley11. Photos by Palmount45, Keith Evans, David Merret, avail- able under Creative Commons Attribution license)

14 Epistasis is inherently difficult to study, and this is a major reason why its importance is still largely an open question. Imagine for instance that we want to test the hypothesis that the phenotypic effect of allele A, at a certain locus, is dependent on the presence/absence of allele B, at another locus. We would then need to collect observations of all combinations (genotypes) between the alleles A and B. Working with a diploid organism such as hu- mans, there are three possible genotypes at one locus, which means that there are 32 = 9 possible genotypes when considering two loci. To test the potential genetic interaction between alleles A and B, we thus need to collect observations of all 9 genotypes (naturally, to achieve any degree of statistical significance when testing our hypothesis, we need multiple observations per genotype). Generalizing this to more than two loci, the number of possible genotypes grows exponentially as 3n, where n is the number of loci. In an ideal scenario, this means that for every additional locus that we want to study, the amount of data required triples. If we are collecting data from a natural population, we are often in an even worse situation, since some al- leles are often quite rare. In order to find a sufficient number of observations of every genotype in a natural population, we thus need to sample even more individuals than indicated by 3n. Figure 2 illustrates the space of genotypic possibilities across multiple loci in a haploid organism, where the number of possible genotypes is 2n. As a consequence of this problem, most quantita- tive genetic studies silently assume an absence of genetic interactions, and even studies that specifically focus on epistasis rarely study interactions be- tween more than pairs of loci. It should be added that this is a well-known problem not just in genetics, but also in other fields that work with high di- mensional data. Within the fields of machine learning and computer science, it is often referred to as the curse of dimensionality.

15

Figure 2. The space of genotypic possibilities. Schematic illustration of all possible allele-combinations (genotypes) when considering from 2 to 5 bi-allelic loci in a haploid organism. (From Wright 1932, Reproduced with permission from GENET- ICS SOCIETY OF AMERICA)

The presence of epistasis might have profound implications for our ability to predict the phenotype of an individual, based on his/her genotype. If the phenotypic effect of alleles differs between individuals, depending on their genetic makeup at other loci, this must be taken into account in order to pre- dict the individual phenotypes. Epistasis might also have large implications for natural and artificial se- lection, and consequently for how evolution progresses12–14. This was the source of a famous controversy between two of the founding fathers of quan- titative genetics, R. A. Fisher and Sewall Wright. In short, Wright consid- ered epistasis to be of vital importance for the evolutionary process. He de- scribed his view through the metaphor of an adaptive landscape, where epi- stasis leads to multiple isolated adaptive peaks, separated by regions of low- er fitness15. This scenario enables certain to remain “hidden” in a population, and the phenotypic effect of such hidden alleles might be revealed during the course of selection. Fisher did not dispute that epistasis might occur, but he did not believe that it had any major implica- tions for the evolutionary process. In his view, the average effect of an allele across many genetic backgrounds is the driver of evolution5. All alleles with any effect on the fitness of the organism would thus either be purged from the population, or driven towards fixation. The Fisherian view leaves little room for what geneticists call standing genetic variation, where genetic variation already present in a population at the onset of selection might allow the population to adapt. Instead, the popu-

16 lation would mainly respond to selection through the appearance of novel mutations. But recent studies suggest that “hidden” or “cryptic” genetic vari- ation might indeed be present in natural populations16–18, without displaying any phenotypic effects. Only upon certain genetic or environmental changes will their effect on the phenotype be revealed. One classical example of this phenomenon is the heat shock protein HSP90, which has the ability to buffer the effect of a range of alleles19. It has been shown in many species that re- ducing the activity of this protein allows these alleles to display various phe- notypic effects20,21.

2.1 “Essentially, all models are wrong, but some are useful” When studying epistasis, geneticists use statistical models to analyze their data. In this section, I will elaborate on the merits and potential pitfalls of these models. The quote above is attributed to George E. P. Box22, and it brings up a key point in all empirical sciences. A model is always a simplifi- cation of reality, but this simplification might allow us to draw important conclusions. Take for instance a map. It is obviously a simplification, com- pared to what it describes, yet everyone would probably agree that maps are useful. We just need to be aware that there is a discrepancy between the map and reality. In quantitative genetics, we typically use linear regression models to draw conclusions about populations23,24. We also use them to make predictions, for instance within animal and plant breeding when deciding which individuals to cross in order to get the desired offspring, or within personalized medicine when deciding on the optimal drug to give to a certain patient (the reader with an interest in how linear models are utilized in quantitative genetics is referred to the appendix for a brief overview). In all of these applications, the statistical models are very useful tools. However, I believe that a lot of the controversy regarding genetic interactions stems from confusion regarding the relationship between the statistical models used in genetics, and the un- derlying reality that they aim to describe. I will try to explain this point using a simple theoretical example. Imagine that we are studying a binary phenotype in a haploid (or inbred diploid) organism. In this simple example, the phenotype is completely de- termined by the alleles at two loci A and B. At locus A, an individual can carry either allele A1 or A2, and at locus B, either B1 or B2. This gives four possible genotypes. Figure 3 shows the phenotype produced by each geno- type, where the genotype A1B1 gives a unique phenotype.

17 B1 B2

A1

A1/A2 B1/B2

A2

Figure 3. Two-locus genetic interaction. A genetic interaction between the two bi- allelic loci A and B in a haploid organism. The subscripts 1 and 2 indicate the two alternative alleles at each locus. The two possible phenotypes is indicated by + (grey) and – (white). The genotype A1B1 produces the + phenotype and the other three genotypes produce the – phenotype.

When fitting a statistical model to data produced by this genetic interaction, we need to make a choice on how to parameterize the data. In other words, we need to define numbers to represent the different alleles. We might for instance parameterize the scenario in figure 3 as:

1, �������� = �! 1, �������� = �! �! = �! = �! = �! ∗ �! 0, �������� = �! 0, �������� = �!

With this parameterization, the interaction term xI has a perfect fit to the data, since it will take on the value 1 for the allele combination A1B1 and 0 for any other combination. If we fit the typical linear model � = � + �!�! + �!�! + �!�! + �, the coefficient �! will thus perfectly capture the genetic mechanism in this example. In statistical terms, it will explain all the genetic variance, and we will find that none of the coefficients �! or �! are signifi- cant. We might thus conclude that the genetics of this phenotype is com- pletely epistatic, or in other words that it is completely non-additive. How- ever, if we use another parameterization such as:

1, �������� = �! 1, �������� = �! �! = �! = �! = �! ∗ �! −1, �������� = �! −1, �������� = �! we would arrive at a rather different conclusion. In this case, none of the parameters in the model have a perfect fit to the data. If we fit the same line- ar model, but use this parameterization, we will find that all three coeffi- cients �!, �!, and �! are significant. Based on this statistical analysis, we might thus conclude that the genetics of this phenotype is somewhat epistat-

18 ic, or in statistical terms that there is a mixture of additive and non-additive genetic variance. The point here is that the only difference is how we choose to define our model, the underlying biology has not changed. In a typical genetic study, the choice of parameterization is arbitrary. The parameters representing the genotypes are just dummy variables, and few geneticists give much thought to how they define them. This may not pose a problem depending on what questions we are trying to answer. But as this simple example illustrates, drawing conclusions about genetic architectures based on the estimated pa- rameters in a statistical model can be problematic. This reasoning can be extended to more loci and to variance component estimates by generalizing the example above in a mixed model framework25. That is however beyond the scope of this thesis. The example given will suffice to illustrate the gen- eral point that parameter estimates, such as effect sizes and variance compo- nents, might reflect the assumptions of the model rather than the underlying biology. The parameterization problem has received some attention in the litera- ture regarding functional versus statistical epistasis. As many authors have pointed out10,13,26, the word epistasis has several slightly different meanings, and this multitude of definitions can cause confusion. When first introduced over a century ago by Bateson1, it referred to the phenomenon where one allele masks the phenotypic effect of another (Figure 1). Later5, Fisher intro- duced the term epistacy, referring to statistical interactions in a regression model relating genotypes to phenotypes. Over time, Fishers term fell out of use, being replaced by epistasis. The two definitions are however not equiva- lent, as the simple example above illustrates. To distinguish the two, the first has variously been called “physiological epistasis”27 or “functional epista- sis”28, while the later is often called “statistical epistasis”13. I will here use the terms functional and statistical epistasis. Proponents of the later defini- tion often judge the importance of epistasis, or lack thereof, by the phenotyp- ic variance explained by interaction terms in a regression model (�! in the example above)9,29. As we have seen, this is problematic, since there is not a one to one correspondence between parameter estimates in a linear model and the genetic architecture. The amount of variance explained is also de- pendent on how common the various alleles are (allele frequencies) in the population under study9,10. Hence, this approach tells us how much addition- al phenotypic variance that is explained by accounting for epistasis, with a certain model parameterization, in a certain population. In applications such as animal or plant breeding, this might be all we care about. But it does not necessarily tell us about the importance of epistasis in the genetic architec- ture of this trait. The subject of parameterization, and how best to model genetic interactions, is rather technical, and might be somewhat obscure to the general reader. The underlying biology that all models try to capture is

19 however simply that certain combinations of alleles produce certain pheno- types. In summary, the presence of epistasis can have profound implications for our ability to predict the phenotype of an individual based on its genotype. By creating multiple adaptive peaks in a “rugged” adaptive landscape, epi- stasis can also have implications for both natural and artificial selection. Epistasis is however difficult to study, and there are some confusion regard- ing how best to model it. The additional amount variance explained by a model that accounts for epistasis depends not only on the genetic architec- ture, but also on our parameterization and on the allele frequencies in the population under study. In paper IV, we empirically explore many of the aspects of epistasis that I have raised in this section. Among other things, we show that even under extensive functional epistasis, an additive model can explain a majority of the phenotypic variance.

2. Genetic variance heterogeneity Variance heterogeneity is the phenomenon where the variability of a certain trait is different between groups. Imagine for instance that we have one group of humans including both very short and very tall individuals, and a second group of individuals whose height are all close to 170cm. This would be a case of variance heterogeneity, where there is a difference in trait vari- ance between the two groups of individuals. Note that the mean height does not have to be different between the groups in order for this to occur. When the difference in variance is genetically determined, we call it genetic vari- ance heterogeneity. This means that groups of individuals with different genotypes at a given locus differ in the variance of the trait of interest, but not necessarily in the mean (Figure 4). The terminology used to refer to this phenomenon is not universally agreed upon among geneticists. Here, as in papers I, III, and IV, I use the term genetic variance heterogeneity. There are however other terminologies in the literature, as I will get back to. I will refer to loci that display genetic variance heterogeneity as vQTL30. The concept of genetic control of trait variability is not new in genetics. Already in the 1940s, Waddington presented the idea of canalization, sug- gesting that natural selection acts to produce traits that are robust to envi- ronmental and genetic perturbations31. He partly based his ideas on the ob- servation that natural populations often are less variable than artificial popu- lations of the same species. It is however only relatively recently that some geneticists have started to look for individual alleles that affect not only the mean, but also the variance of traits30,32,33. Although a growing body of evi- dence suggests that phenotypic variability can be genetically controlled33–35, this phenomenon still receives relatively little attention in quantitative genet- ics.

20 ...CAGAGATTCTGCGGGCTAATGCG...... CAGAGATTCTGTGGGCTAATGCG...

Phenotype

Figure 4. Genetic Variance Heterogeneity. Illustration of the phenotypic distribu- tions of two alternative genotypes. Individuals with the genotype C display a narrow phenotypic distribution, whereas individuals with the genotype T display more vari- able phenotypes. There is no mean difference between the two genotypes, and the locus would therefore not be detected using current popular analysis methods. It could however be detected using methods designed to detect genetic variance heter- ogeneity (vGWAS).

Searching for vQTLs might be of interest for several reasons. But since they can be caused by a number of different mechanisms, the subsequent interpre- tation is not straightforward. In short, the observation of a vQTL indicates that something beyond the normal assumptions in quantitative genetics is influencing the trait. In this section, I will briefly outline three potential mechanisms that can give rise to vQTLs. Genetic variance heterogeneity is a somewhat abstract concept, since var- iance only has a meaning for groups, not for individuals. What does it mean that a certain allele increases the variance of a trait? There is an amount of literature on this subject, and different authors have thought about it in dif- ferent ways. In an influential paper from 2010, Hill and Mulder32 proposed that “the environmental variation” can be regarded as a phenotype in its own right. One can then invoke much of the quantitative genetics methodology to search for genetic determinants of this phenotype. They consequentially called this phenomenon “genetic control of the environmental variation”, a terminology which implies that it is the randomness, or instability, of the trait that is genetically controlled. One such example was identified in Ayroles et al34, a study of variability in locomotor behavior in fruit flies. The authors demonstrated that the degree of variability was heritable, and using a GWAS they identified several genes affecting this variability. In addition to these cases where the randomness of the trait is under some degree of genetic control, there are also other mechanisms that can give rise

21 to genetic variance heterogeneity. One such mechanism is interactions, and in particular genetic interactions. Rönnegård and Valdar30 describe how a locus that is involved in a gene-by-gene (GxG) or a gene-by-environment (GxE) interaction might be expected to display genetic variance heterogenei- ty. If we observe a vQTL, it could thus be an indication that this locus is interacting with some other unknown factor. Consequently, looking for vQTL:s has been suggested as a potential “short cut” to detect epistasis30,36– 38. Finally, genetic variance heterogeneity can also be caused by the exist- ence of more than two functional alleles in partial linkage with each other. GWAS and QTL-mapping studies typically assume that every locus can display two alternative alleles (bi-allelic), but it is well known that more than two functional alleles might be present relatively close to each other in the genome. If we, in such a scenario, analyze a bi-allelic genetic marker located at this locus, we might observe genetic variance heterogeneity. This vQTL would thus indicate that the locus contains more than two alleles with a phe- notypic effect. In paper III, we identify such an example and also provide a thorough description of how and when multi allelic loci are expected to cause variance heterogeneity. A couple of things are worth pointing out regarding epistasis, and multi allelic loci, as potential causes of genetic variance heterogeneity. Unlike the first scenario outlined by Hill and Mulder, in these two other scenarios, there are no alleles that increase the randomness. In the epistatic scenario, we ob- serve an increased variance because one allele allows other alleles to display their effects. One can thus think of the genetic variance heterogeneity as a “side effect” of epistasis. It arises when we look at the average effect of cer- tain alleles, in cases when we should really be looking at their joint effect together with alleles at other loci (see paper IV). This also means that if we were to make many genetically identical individuals that all carry the high variance allele, we would not expect this group to be any more variable30. It should be noted though that not all forms of epistasis are expected to cause genetic variance heterogeneity. This is important to bear in mind, since searching for vQTLs has been suggested as a potential way of detecting epi- stasis – it will not detect all epistasis. Also in the third scenario, the variance heterogeneity can be thought of as a “side effect”. Here, it arises when we analyze a locus as if it only contained two alternative alleles, when in fact it contains several. In most GWAS or QTL-mapping studies this is the only course available, since we only have information about so called genetic markers throughout the genome. This means that even if multi allelic loci exist in our study population, we usually don’t have genotypic data about all these alleles. The only information avail- able to us is the genetic markers, each typically containing two alternative alleles, which we hope are linked to the alleles that actually affect the pheno- type. It is impossible for these two alleles to accurately represent a multi-

22 allelic locus, but the observation of variance heterogeneity can alert us to the fact that there might be multiple alleles with a phenotypic effect at this locus. It is then possible to design follow up experiments to verify this. In the same way as has been proposed regarding epistasis30,37,38, searching for vQTLs might thus be a way to detect multi-allelic loci.

23 Present investigations

The four studies presented in this thesis explore different aspects of genetics that cannot be fully captured using standard additive models. In this pursuit, we utilize three different organisms: the domestic dog Canis lupus familiar- is, the plant Arabidopsis thaliana, and the budding yeast Saccharomyces cerevisiae.

Paper I In this investigation, we studied the plant A. Thaliana, a widely used model organism in plant research39. Arabidopsis Thaliana is a weed native to Eura- sia, and it is found across a wide range of climates40. It is thus an attractive model for studying how organisms genetically adapt to different environ- ments. We used two publically available datasets. In the first dataset, the RegMap panel41, we obtained SNP genotypes and climate data for the sam- pling sites for 948 A. Thaliana clonal lines (accessions). The second dataset comes from the community effort 1001 genomes project42 (1001G), aiming to sequence the genomes of a large number of A. Thaliana accessions, repre- senting the native Eurasian and North African range of the specie. Here, we obtained genotypes and corresponding climate data for 665 A. Thaliana ac- cessions. In both datasets, the plants have been sampled across the native range of the species, and we scanned their genomes for alleles associated with the climate at the sampling sites. The design of this investigation was that of an association study, using various climate variables at the sampling sites as phenotypes. Such associa- tions are problematic in a GWAS setting, since climate variables are inher- ently confounded with population structure. If we for instance search for loci associated with warm versus cold climate, we will in effect largely be identi- fying genetic differences between plants in southern versus northern Eurasia. Some of these differences might be related to the plants ability to cope with the climate. But most likely, many of the differences are due to the fact that the southern and northern populations have been isolated from each other during a long time, allowing them to drift apart. To overcome this problem, instead of performing a conventional GWAS analysis, we scanned the ge- nome for loci displaying genetic variance heterogeneity (vGWAS). This method can detect signals of selection where one allele is advantageous in a

24 particular climate and the opposite allele works equally well across a range of climates. In contrast, a GWAS analysis would detect signals where alter- native alleles are present in different climates. The variance heterogeneity signal has the advantage of being less confounded with population structure, and it also reflects a mode of selection that might be missed using a conven- tional GWAS analysis. Using this method, we identified sixteen novel loci that were associated with the climate at the sampling site, without being confounded with popula- tion structure. These loci are thus likely important for climate adaption. Spe- cifically, we found an association between a polymorphism in the gene CMT2 and temperature seasonality, a measurement of the temperature range throughout a year at a certain geographic location. High temperature season- ality means that there is a large difference in temperatures, whereas low val- ues indicate a stable temperature throughout the year. We found a premature stop codon in the first exon of the CMT2 gene. This CMT2STOP allele was present in plants collected from sites with a wide range of temperature sea- sonalities, while the opposite CMT2WT allele was confined to regions with low temperature seasonality (Figure 5). We replicated this finding in the 1001G dataset and also found two additional mutations disrupting the CMT2 gene; one premature stop codon (CMT2STOP2) and one frameshift mutation. CMT2STOP2 displayed the same type of association to temperature seasonality as CMT2STOP, whereas the frameshift mutation only was found in two plants. The gene CMT2 is involved in epigenetic regulation43,44, and we could show that the CMT2STOP allele affected the genome wide methylation pattern in transposable elements. To functionally explore the role of CMT2 in tempera- ture response, we subjected wild type plants and plants with a non-functional CMT2 gene to heat stress. The plants with a non-functional CMT2 displayed a higher survival rate under severe heat stress, and a greater tolerance to milder heat stress, compared to wild type plants.

25

Figure 5. Geographic distribution of three CMT2 alleles in A. thaliana accessions.

The CMT2 wildtype allele (CMT2WT gray open circles) is primarily found in regions with low temperature seasonality, whereas the disruptive alleles (CMT2STOP filled triangles, CMT2STOP2 open triangles) are present across a larger range of climates (A), both in the RegMap (blue) and 1001G (red) collections. The resulting variance heterogeneity between wildtype and disruptive alleles is highly significant, as illus- trated by the quantile plots (B). The color code from red to purple indicates the temperature seasonality across the map. (From paper I, available under Creative Commons Attribution license)

This investigation shows that by searching for loci that display genetic vari- ance heterogeneity, we can overcome some of the problems of confounding between phenotype and population structure. Such confounding is a common problem in GWAS45,46. The study also shows that the vGWAS approach allows us to find signatures of selection that might otherwise have been overlooked. The conventional approach in a scenario like this would be to look for loci where alternative alleles have been selected for in different climates, typically leading to a selective sweep. While such cases might cer- tainly occur, they will be hard to disentangle from genetic drift due to the confounding with population structure. This is especially the case if selection has acted on standing genetic variation, rather than novel mutations, in which case we might not expect to observe a classical hard sweep47. Here, we instead look for cases where one allele is present in a narrow climate range, and the opposite allele is tolerated across a wider range. This pattern could for instance arise if a certain allele arises that allows the plant to colo-

26 nize a new climate, but which carries no fitness cost in the climate in which the plant is already present. This new allele would then be selected for in the new climate, but be selectively neutral in other climates, giving rise to just the pattern that we observe for the CMT2STOP alleles.

Paper II In this investigation, we studied the genetics of glucose metabolism in domes- tic dogs through a GWAS. The dog has accompanied man for thousands of years, sharing our living environment and to some extent eating the same food as us. Dogs also suffer from many of the same welfare diseases as humans, such as asthma, allergies, and diabetes48. This makes them an interesting mod- el organism for studying these diseases. The prevalence of diabetes mellitus varies greatly between dog breeds48, suggesting a genetic determinant for the disease. By identifying alleles affecting blood glucose levels in dogs, we aimed to gain insights into the genetics of diabetes mellitus and why its preva- lence varies across breeds. In particular, we were interested in potential non- additive genetic mechanisms. Through an international collaboration49, 535 dogs from nine different breeds and five European countries where sampled. Each dog was genotyped and its level of fructosamine, a biomarker that re- flects the average blood glucose levels over the last 15-20 days, was measured. In the Belgian shepherd breed, we found an association between fructosa- mine levels and a locus A at chromosome 3. Although this was not the only breed harboring both alleles at this locus, this association did not appear in any of the other eight breeds. One possible explanation for such a scenario is epi- stasis, where something in the genetic background of the other breeds prevents locus A from displaying its effect. To identify loci that might be interacting with locus A in this way, we looked for genomic regions where the Belgian shepherds significantly differed from the other breeds. In this way, we identi- fied a locus on chromosome 5 (locus B) where the Belgian shepherds were close to fixation for one allele B1, and the opposite allele B2 dominated in the other breeds. By analyzing the joint effect of these two loci in the other eight breeds, we found evidence for an interaction between them. Among the minority of the non-Belgian shepherd dogs that carried the B1 allele, locus A displayed an effect similar to what we observed in the Belgian shepherds. The majority of non-Belgian shepherd dogs, however, carried the B2 allele, and among these dogs locus A displayed a much smaller effect. Therefore, we hypothesize that this interaction explains why locus A display a genome wide significant asso- ciation to fructosamine only in Belgian Shepherd dogs. In human GWAS, it has frequently been observed that associations do not always replicate across ethnically diverse populations. For instance, GWASs of type 2 diabetes have identified associations to different loci in different

27 ethnic groups50–52. A common theory to explain this observation is that alt- hough we genotype the same SNPs as genetic markers in the different popula- tions, the underlying causal alleles might not be present in all populations. This theory is certainly plausible, and it does have the virtue of simplicity. The interaction observed in this study is however interesting, since it presents a possible alternative explanation to this phenomenon. Maybe epistasis is one of the reasons why associations found in GWAS do not generalize across differ- ent populations. Of course, a third possible explanation is that some associa- tions are simply false positives. As a general model for studying epistasis, dogs are not optimal because of noisy phenotype data and the problem of getting sufficient sample sizes. I therefore started to look for a better model system, leading me to paper IV in this thesis.

Paper III Here, we once again studied the plant A. Thaliana. In a previous paper by Shen et al33, a vQTL was detected where plants with alternative alleles dis- played a very different variance in leaf molybdenum concentrations. This vQTL encompassed the gene MOT1 (Molybdate transporter 1). Molybdenum is an essential element for plant growth, and both excess and deficiency has a large impact on plant development53. In this investigation, we first wanted to see if the genetic variance heterogeneity associated with the MOT1 locus could be replicated in an independent study. Secondly, we were interested in under- standing the underlying mechanism giving rise to the observed variance heter- ogeneity. In this pursuit, 340 previously genotyped A. Thaliana plants where grown under identical greenhouse conditions, and the concentration of molyb- denum in their leafs was measured. This work was performed in collaboration with professor David Salts lab at the University of Aberdeen. The previously identified vQTL was replicated in this independent dataset, where we observed a sevenfold difference in phenotypic variance between plants with alternative alleles at the MOT1 locus. In an attempt to identify the causal alleles, we screened the region of MOT1 using PCR fragment size differentiation, thereby identifying six non-coding structural polymor- phisms. These were then genotyped in 283 of 340 phenotyped plants. Two of the structural polymorphisms, one 53bp deletion and one 283bp duplication, where significantly associated with mean leaf molybdenum concentration. We also found two more associations between SNPs (SNP1, SNP2) and mean leaf molybdenum concentration, making a total of four independent associations in the vicinity of the MOT1 locus (Figure 6). The two structural polymorphisms, and SNP1, were all in Linkage Disequilibrium (LD) with the high variance associated SNPs across this region. The linkage was such that plants with the high variance associated SNP alleles also carried either the deletion, or the

28 duplication, or the minor allele at SNP1 (Figure 7). The deletion decreased leaf molybdenum concentration, whereas the duplication and the minor allele at SNP1 increased it. Hence, in the group of plants with either of these alleles, some had a very high phenotype and others a very low. One of the alternative SNP alleles in the vQTL signal collectively tagged this highly variable group of plants, giving rise to the observed genetic variance heterogeneity.

Figure 6. The multi allelic MOT1 locus. Schematic illustration of the four alleles associated with leaf molybdenum concentrations. The deletion decreased the molyb- denum concentration, whereas the duplication and the minor alleles at SNP1 and

SNP2 increased it. The deletion, duplication, and the minor SNP1 allele were all linked to the vQTL SNPs, giving rise to the observed variance heterogeneity. (Modi- fied from paper III, available under Creative Commons Attribution license)

Figure 7. Multiple functional alleles give rise to variance heterogeneity. There was sevenfold difference in phenotypic variance between plants with the low (top panel) and the high (bottom panel) variance associated SNP alleles. The high variance in the second group was caused by two alleles with a positive (duplication and SNP1 minor allele) and one allele (deletion) with a negative phenotypic effect. (Modified from paper III, available under Creative Commons Attribution license)

29 SNP1 was located ~25kb upstream, and SNP2 ~600kb downstream, of the MOT1 gene (Figure 6). This prompted us to consider that additional genes in this region might play a role in how the plants accumulate molybdenum. To explore this possibility, we evaluated the molybdenum levels in a range of plants, each with a disruptive T-DNA insertion in one of the 24 genes sur- rounding SNP1 and SNP2. Two of the tested T-DNA insertion plants, dis- rupted for genes in the vicinity of SNP2, displayed a significant reduction in molybdenum levels. This suggests these two genes (AT2G27030 and COPT6) as potential functional candidates for this association. The results of this study highlight the potential gains, and the difficulties, of searching for genetic variance heterogeneity. At the onset of this investi- gation, I was expecting to find evidence of epistasis causing the extremely pronounced variance heterogeneity that we observed. Instead, it turned out that the existence of multiple functional alleles at this locus, combined with a particular linkage pattern, was the cause. As far as I am aware, this is the first time this phenomenon has been reported. It opens up the possibility that multi allelic loci can be detected by searching for genetic variance heteroge- neity. At the same time, the results stress the point that genetic variance het- erogeneity can mean many different things. As stated in the introduction, the observation of a vQTL indicates that something beyond the normal assump- tions in quantitative genetics is influencing the trait. But in order to under- stand what this something is, further analyzes are typically needed.

Paper IV In this final investigation, we studied the budding yeast S. cerevisiae. Yeast is a widely used model organism in molecular genetics54. Most importantly, for the scope of this thesis, it is a very good model system when studying epistasis. Here, we used a cross between a lab strain (BY), and a vineyard strain (RM) of yeast. This generated a population of 4,390 offspring, all of which was genotyped for 28,220 SNPs, and measured for 20 growth traits. This experimental population overcomes many of the difficulties of studying genetic interactions, described in the epistasis section above. The individuals in this population are haploid, the sample size is large, and since they are siblings, all allele frequencies are close to 50%. We could thus observe every possible combination of alleles across many loci, and study the associated phenotypes. This work was performed in collaboration with the Kruglyak lab at UCLA, and it builds on pervious work presented by Bloom et al29. In a previous study, Bloom et al29 mapped 939 additive QTLs and 330 pairwise genetic interactions in this population. We used these statistical inter- actions to identify networks of interacting loci. A few loci appeared as hubs in the networks, interacting with several other loci and thereby tied the networks together (Figure 8). Many of these hubs had the ability to capacitate the phe-

30 notypic effect of other loci in the networks. They thereby acted as a sort of “on/off” switches, where the other loci showed no, or a small, phenotypic effect when combined with the “off” allele at the hub locus. The opposite (“on”) hub allele however allowed the radial loci to display their effects.

A Xylose Cobalt Chloride Manganese Sulfate YPD B ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● Zeocin E6 Berbamine Indole Acetic Acid Lactate ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Number of loci ● ● ● ● ● ● ● ● ● ● ● Copper Sulfate Neomycin Rafnose ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 200 250 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 12 ● ● ● ● ● ● ● ● ● ● Number of genetic interactions Figure 8. Many QTLs involved in pairwise genetic interactions are part of highly interconnected epistatic networks. A histogram of the number of interactions each epistatic QTL is involved in (A). Most QTLs are involved in few pairwise interac- tions, but some are involved in many, thereby acting as hubs in the epistatic net- works (B). Each circle represents a QTL and the lines represent significant pairwise interactions. Red circles highlight hub-QTL involved in five or more pairwise inter- actions.

Because of the large sample size, we could then go on to look at combinations of alleles across multiple loci in these epistatic networks. For every hub locus and five of its interactors, we estimated the phenotypic effect of each of the 26 = 64 possible multi-locus genotypes. In all networks but one, we found at least one genotype whose phenotype significantly deviated from what would be expected, had the six loci acted in an additive manner. These deviant genotypes often pro- duced the most extreme phenotypes, far beyond the expectation based on the marginal effect of every locus. We also observed that the more loci that a given locus interacted with, the larger were its effect on the phenotypic variance. This observation supports earlier suggestions that vGWAS might be utilized to iden- tify epistatic loci36,37. One network in particular caught our attention. The hub locus in this network displayed the largest capacitating effect of all hubs, as measured by the narrow sense h2. Among individuals with the “off” hub allele, h2 was 14%, whereas among individuals with “on” hub allele, h2 amounted to 55%. The radi- al loci displayed virtually no phenotypic effect when combined with the “off” hub allele. Interestingly, many of the loci in this network contained genes in the pheromone response pathway, suggesting a possible biological explanation for the observed genetic interactions.

31 The existence of genetic capacitors has been reported before, the most well known example being the chaperon HSP9019,21. These capacitors have the ability to “hide” genetic variation, which upon certain genetic or environmental chang- es can reveal its phenotypic effect55. Such cryptic genetic variation might have large implications for the evolutionary process17,56,57. The fact that we observe multiple examples of genetic capacitors supports the idea that this genetic mech- anism might be relatively common, and that cryptic genetic variation should be considered as an important driver of evolution. As I discussed in the epistasis section above, additive/marginal effects in a linear regression model will capture not only additive gene action, but also parts of non-additive mechanisms9,10. This means that genetic architectures that are functionally non-additive might still display additive genetic variance. To what extent depends on the model parameterization, and on the allele frequencies in the population under study. Having extensively characterized the joint phenotyp- ic effects of many loci in the epistatic networks, we wondered how dependent the additive genetic variance was on the allele frequencies at these loci. Given the observed genetic architectures, what amount of additive genetic variance would we observe in other populations, where the allele frequencies are differ- ent? We addressed this question using simulations: for each of the 15 networks studied in detail, we simulated populations where the genetic architectures were the ones that we observed in the real data, but the allele frequencies varied. In other words, we extrapolated from the observed population, to hypothetical populations with different allele frequencies. The results showed that under the observed genetic architectures, the amount of additive genetic variance associat- ed with every locus depends largely on the allele frequencies at the interacting loci. In the most extreme case, one locus explained 0% of the phenotypic vari- ance in some populations, and up to 58% in others, depending on the frequen- cies of its interactors. Under additivity, the variance explained by this locus would be expected to vary between 6% and 13%. Despite the fact that many multi-locus genotypes produced phenotypes that did not conform to the additive expectation, the overall improvement in pheno- typic variance explained when accounting for the genetic interactions was not that big. This is an important observation, since the amount of variance ex- plained is often used to quantify the importance of epistasis. Both according to theory and previous observations9,29, a simple additive model is expected to explain a large fraction of the genetic variance. Here, we observe that this is true even under extensive epistasis. The reason is that the number of individuals that carry the deviant genotypes only represent a small fraction of the population, whereas the majority of the individuals conform relatively well to additive ex- pectations. Simply put, for most individuals we can make pretty decent pheno- typic predictions based on their genotype without accounting for epistasis. For some individuals however, epistasis really matters, and we severely miss-predict their phenotype if we do not account for it. This observation connects to the discussion regarding the functional versus statistical views of epistasis. On the

32 one hand, there are many examples where epistasis has been shown to have very large effects on individual phenotypes58,59. On the other hand, both theory and previous observations suggests that the additive genetic variance dwarfs the epistatic genetic variance9,29. This study shows that these two views are compat- ible with each other.

33 Summary and future perspectives

I have often heard my father say, “the more you know, the more you know you don't know”. The statement supposedly originates from Aristotle, and it certainly holds true for the scientific process. Every result leads to a new question, and thereby opens up new avenues of inquiry. In the beginning of this thesis I stated, maybe somewhat boldly, that our current understanding of genetics is only rudimentary. I do believe that is the case, and the work presented here is but a small attempt to alleviate some of our ignorance. It might however provide more questions than answers. The results in papers I, III, and IV illustrate many of the possibilities, and potential pitfalls, of searching for genetic variance heterogeneity. It might allow us to identify both epistatic and multi-allelic loci. We also show that by searching for genetic variance heterogeneity, signatures of selection can be revealed that might otherwise have been overlooked. These signatures also have the advantage of being less confounded with population structure than typical GWAS associations. In summary, vGWAS represents a promis- ing tool that might lead to interesting findings in the future, but as I have discussed above, when interpreting vQTLs, we need to be aware that they can be the result of many different processes. I have already discussed the somewhat technical issue of parameteriza- tions in linear models. In paper IV, we tried to avoid this issue by applying a more “model free” approach. When looking at the joint phenotypic effect of six loci at a time, instead of fitting a linear model, we simply visualized the phenotype of every allelic combination across these loci. As a first approach, we then looked if any patterns could be discerned among these 64 genotypes. One of the patterns that emerged was that half of the genotypes, which all had the same allele at the hub locus, had very similar phenotypes, whereas the other half displayed very different phenotypes. This was the capacitating effect of the hubs discussed above. If we instead of this visual approach where to fit a linear model that captures all 64 genotypes, we would need to include up to 6-way interaction terms. While this is possible with the given data, interpreting such a model is not straightforward. Out of the 64 parame- ters in such a model, we will observe a mixture of main effects, pairwise effects, and higher order interaction effects that are significant. But, as stated above, the underlying biology that the model attempts to capture is simply that certain genotypes give rise to certain phenotypes. Instead of trying to interpret the parameters of such a linear model, we used our visual approach.

34 As we have seen, the magnitudes and significances of the parameters in a linear model are determined not only by the dependencies between the al- leles, but also by the parameterization and the allele frequencies. One conse- quence of this is that these parameter estimates are not expected to general- ize across different populations. However, if we can identify the functional dependencies between all alleles affecting our trait, this underlying mecha- nism should be the same regardless of the population. Such a complete de- scription of the phenotype produced by every possible genotype is often referred to as the genotype-phenotype map60, and ultimately I believe this is what we need in order to make accurate predictions of individual phenotypes based on their genotypes. The “model free” approach described above has the drawback of being rather labor intensive. It relies on manual detective work by the researcher, trying to visually discern patterns by looking at the phenotypes associated with every genotype. Also, before utilizing this approach we obviously need to decide which loci to have a closer look at. In other words, we need to nar- row down the search space. In paper IV, we started with the results from a pairwise epistatic scan, which was used to identify epistatic networks. These networks represent loci whose phenotypic effects are to some degree mutual- ly dependent on each other. Since the present sample size allowed us to ex- amine the full genotype-phenotype map of up to six loci, we next picked subsets of six loci from these networks. In order to prioritize which subsets to examine, we somewhat arbitrarily picked the five strongest interactions around every network hub. This allowed us to identify many interesting pat- terns, but there might certainly be more findings waiting to be revealed in this dataset. For instance, a genome wide scan for higher order interactions might reveal additional genetic interactions that are of importance14. The concerns raised above apply also to future studies. Even if we manage to design experiments that allow us to characterize the full genotype-phenotype map of large numbers of loci, how do we decide which loci to prioritize? Are brute force epistatic scans the way to go? Is it feasible to manually study the genotype-phenotype maps of large numbers of loci? These questions regard- ing the optimal analytical strategies to study epistasis deserves more atten- tion as the field of genetics moves forward. As we show in paper IV, certain individuals might carry multi locus gen- otypes that result in phenotypes far outside of the expectations based on the marginal effects of these loci. In particular, we found many instances where such deviant genotypes gave rise to phenotypes in the extremes of the popu- lation-wide phenotype distribution. We also show that despite this, a majori- ty of the genetic variance can be additive. This observation could be of im- portance for prediction in a clinical setting, where the extreme phenotypes are often the most important ones. Even if the marginal/additive effects ex- plain a lot of phenotypic variance in the population as a whole, they might miss-predict individuals with extreme phenotypes. These individuals might

35 develop certain deceases, and would therefore be the ones that we actually care about. The variance explained by the additive effects would thus not be a good indicator of whether an additive model fulfills its purpose, which in this case would be to tell which patients will get sick. Apart from the general points about additive versus non-additive genetics raised in this thesis, several interesting biological insights have been reached in these four investigations. In paper I, we revealed a link between the gene CMT2, methylation of transposable elements, and heat tolerance. Within the natural population of A. Thaliana, we found several disruptive mutations in this gene that appears to have been selected for in certain environments. We found evidence that the mutations affect the genome wide methylation pat- tern, suggesting that epigenetic modifications can be genetically determined. This is an important observation with regards to the role of epigenetics in evolution. There has been a fair deal of discussion about how epigenetics might allow acquired traits to be passed on to future generations, presenting an intriguing Lamarckian angle to genetics61. However, the results presented in paper I suggest that the determinants of epigenetic marks might still be found within the genome. In paper II, we found an association between a locus on chromosome 3 and fructosamine levels in Belgian Shepherd dogs. At this point we do not know the causative allele(s) underlying this signal, but the genes GAPDH and LETM1, which play a central role in glucose metabolism, are located close to the association. The interacting locus on chromosome 5 harbors the melanocortin receptor gene MC1R. A dominant allele of this gene is known to cause the melanistic mask phenotype commonly found in Belgian Shep- herd, German Shepherd, and Boxer dogs62. All of these three breeds share the pattern of reduced heterozygosity in this region. What we observe is thus likely a selective sweep resulting from breeding for the melanistic mask phenotype. The melanocortin receptor pathways has been linked to various metabolic disorders, including diabetes and obesity63,64, but the MC1R type of the receptor family has not previously been linked to any metabolic phe- notypes. We can only speculate as to why we observe an epistatic effect on fructosamine involving this locus. A follow up study is however under way that might allow us to identify the causative alleles at both of these loci. In paper III, we revealed four associations within a confined region on chromosome 2 to the ability of A. Thaliana to accumulate molybdenum. For two of these associations, the causative alleles are likely structural variants within the promoter region of the gene MOT1. For the other two associa- tions, we do not yet have any good candidates as to the causative alleles. By evaluating T-DNA insertional alleles for genes in LD with these two associa- tions, we were however able to suggest two novel genes involved in molyb- denum homeostasis in A. Thaliana. One of these genes, COPT6, is a known copper transporter65, suggesting a possible link between copper and molyb- denum uptake in plants. Molybdenum is essential for plant development, due

36 to its role in the molybdenum cofactor that forms the active site of all mo- lybdenum enzymes66. These findings are thus important for our understand- ing of how plants can adapt to soils with different levels of molybdenum, and they might also be of interest for future plant breeding efforts. There are many challenges on the horizon for the field of genetics. The last decades has seen an enormous development in both sequencing and gene editing techniques. We now have the means to both read and edit the ge- nomes of organisms on an unprecedented scale. But our understanding of the link between genotype and phenotype, the fundamental question of genetics, has not increased proportionally. Using a somewhat strained analogy, we are trying to operate an immensely complex computer by reading and making changes to its binary code. But we have only a crude understanding of how information in this code translates into end product. Ultimately, we would like to be able to predict the phenotype of an individual based on its geno- type. For some monogenic traits, we can already do this. But for most com- plex polygenic traits, we still have a long way to go. I hope that the work presented in this thesis is a small step towards this goal.

37 Supplements

Statistical Models Linear models are the bread and butter of quantitative genetics. I will here give a very brief outline of the models that are most commonly used. This section assumes that the reader has basic knowledge of statistics and linear algebra. The simplest linear model can be written as equation (1), where y is the response variable, X is a matrix with columns corresponding to the ex- planatory variables, � is a vector of fitted effect sizes, and � ~ �(0, ��!) are the residuals.

� = �� + � (1)

In a genetics context, y contains the individual phenotypes and X typically contains the genotypes at the loci that we want to test for association to the phenotype. Each column of X then corresponds to one locus, and each row to one individual. The estimated parameters � tell us the average phenotypic difference between individuals with alternative alleles at the corresponding loci. Using a technique such as a likelihood ratio test, we can test whether this difference is significant. One interpretation of the �:s is that they repre- sent the phenotypic change that we expected to observe, if we were to substi- tute one allele for the other at the corresponding position. This interpretation however, requires the assumption that the phenotypic effects of the alleles at the different positions in the genome are independent of each other. In other words, if we were to edit the genome of individual A, changing one allele for the other at a certain locus, this will produce the same effect on the pheno- type regardless of the genetic makeup of this individual at other loci. As discussed above, this might not always be a reasonable assumption. Equation (1) is an example of a fixed effects model, meaning that all the estimated parameters � are fixed. Conceptually this means that although we estimate them with some degree of uncertainty, we assume that there is one true value for the parameter in this population. In linear mixed models on the other hand, we can include parameters that are in themselves random varia- bles, meaning that they come with their own inherent uncertainty. Such pa- rameters are called random effects. This opens up a wide range of ways to

38 model and analyze data, making linear mixed models extremely flexible and useful in various applications. Here, I will only describe the way in which they are typically used in GWAS. In its general form, a linear mixed model can be written as equation (2). As in equation (1), the vector � contains the fixed effects and X is the corresponding design matrix. The vector ! � ~ �(0, ��! ) contains the random effects, Z is the design matrix for the ! random effects, and the vector � ~ �(0, ��! ) contains the residuals. In this formulation of the model, we assume that the random effects b are uncorre- lated, indicated by the identity matrix I in their covariance structure.

� = �� + �� + � (2)

In a GWAS setting, this model allows us to incorporate information about the relatedness, in genetics referred to as kinship, between all analyzed indi- viduals. Doing so, we can correct for the phenotypic effect that can be ex- plained by population structure, giving us a way to deal with the potential problems of confounding between the population structure and the pheno- type. Formulating the model as in equation (3) allows us to more clearly see ! how this is achieved. Here � ~ �(0, ��! ), where G is the genetic kinship matrix, describing the degree of relatedness between every pair of individu- als. Monozygotic twins for instance, have a kinship value of 1, whereas full- sibs have an average value of 0.5. Fitting such a model to our data, we can test the phenotypic effect of certain alleles, given by our fixed effects �, while correcting for the potential confounding with population structure.

� = �� + � + � (3)

39 Acknowledgements

Now comes the most popular part of any thesis. If you have read everything up this point, I congratulate you and hope that at least some of it made sense. The studies that make up this thesis were performed at SLU, Uppsala Uni- versity, and UCLA. I have worked with a lot of great people during the last four years and I would like to thank all of you. In particular, I would like to acknowledge the following people. My apologies in advance to anyone who feel left out. Örjan, I am truly grateful to have had you as my supervisor. Apart from being an outstanding scientist and mentor, you have also become a good friend. I value all our long discussions on science, politics and various other topics that is better not named here. Whether it be scientific questions, rela- tionship advice, or broken cars, you have always been there to help. During the course of the last four years, you have helped me grow as a researcher and we have developed a very inspiring and creative working relationship. I hope that we will continue to work together in the future. Marcin, you are a good friend, mentor, and machine learning-wizard. You are the reason I started this phd in the first place and for that I am grateful. Sharing office with you was the best start a new and inexperienced phd- student could hope for. Thank you for everything you’ve taught me about science and for all the crazy discussions. Monika, you are a great contrast to all the nerds, such as myself, that popu- late this business. It is ok to not like Star Wars, don’t let anyone tell you otherwise! Thank you for your moral support and for all the discussions, the clever ones and the less appropriate ones. Mette, among your many qualities, I admire your positivity and enthusiasm. Being of a more Nordic and introverted nature myself, you are a welcome antidote to my Swedish ways. We also share a common interest in beer, which, as you have pointed out, might be our edge in this business. I’m look- ing forward to more of that in the future. Yanjun, your development since you arrived in the lab has been truly amaz- ing. You manage to combine the qualities of being kind, humble, and a tal- ented scientist. Without your input, this would surely have been a worse thesis.

40 Ronnie, my introverted South African kindred spirit. We always seem to have something interesting to discuss, and I appreciate your somewhat nihil- istic perspective. I hope that there will be many more rewarding, intellectual, and sometimes pretentious conversations in the future. Mats, you always have an insightful perspective on every topic, be it science or Civilisation V. Thank you for always keeping your office door open, and for putting up with all my questions. Zheya, you just spread joy to everyone around you. The lab has not been the same since you left! I remember in particular our trips to Italy and China. You are a good friend and a great scientist and I wish you all the best. My brilliant statistical colleagues Xia and Lars. Thank you for all your great scientific input and for showing me the wonders of statistics. Xia, that time in Italy when you made your point about modeling a chain of causally linked variables, through derivations on a napkin, is particularly memorable. Lars, we should really make that hiking trip that we’ve been talking about. All the people in the Kruglyak lab. You made my three months at UCLA truly memorable. Thank you for putting up with the weird Swede! Nima, Sangeet, Sharda, Leif, Susanne, Ginger, Erik, Jessika, Matteo, Martin, Mat, and all the others at IMBIM. Thank you for making me feel welcome here. Mad Artwork, the best band in the world. It is truly a privilege to be able to play with such talented musicians. I believe that our mad music has resulted in a couple of more or less sensible scientific ideas as well. Keep rocking! Herman och Lisa, mina äldsta och mest trofasta vänner. Jag ska inte tråka ut eventuella läsare med allt ni ska ha tack för. Men tack ska ni ha! Det är inte alla förunnat att ha en så stöttande och kärleksfull familj som jag. Världens bästa Mamma! Du är en klippa. Tack för att du alltid ställer upp och stöttar oavsett vad det gäller. Pappa, du har alltid tålmodigt diskuterat och förklarat allt från relativitetste- ori till musik och medeltidshistoria. Jag minns att du redan när jag gick i lågstadiet försökte förklara varför tiden går långsammare om man sätter sig i ett rymdskepp och åker riktigt fort. Tack för att du alltid varit intresserad och tagit alla mer eller mindre rimliga frågor och funderingar på allvar! Jag är övertygad om att mitt intresse för vetenskap är din förtjänst. En storebror är alltid bra att ha! Tack bland annat för alla musiktips. Bluesrock från 70-talet är faktiskt en bra grej att ha i lurarna när man skriver avhandling. Slutligen vore inget av det här möjligt utan Emma. Tack för att du finns!

41 References

1. Bateson, W. & Mendel, G. Mendel’s principles of heredity. (Cambridge University Press, 1909). 2. Darwin, C. The origin of species. (1859). 3. Galton, F. A Theory of Heredity. J. Anthropol. Inst. Gt. Britain Irel. 5, 329–348 (1876). 4. Galton, F. Typical laws of heredity. Nature 15, 512–514 (1877). 5. Fisher, R. A. The genetical theory of natural selection: a complete variorum edition. (Oxford University Press, 1930). 6. Watson, J. D., Crick, F. H. C. & others. Molecular structure of nucleic acids. Nature 171, 737–738 (1953). 7. Sanger, F. et al. Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265, 687–695 (1977). 8. Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008). 9. Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, (2008). 10. Mackay, T. F. C. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat. Rev. Genet. 15, 22–33 (2014). 11. Carlborg, O. & Haley, C. S. Epistasis: too often neglected in complex trait studies? Nat. Rev. Genet. 5, 618–625 (2004). 12. Le Rouzic, A., Siegel, P. B. & Carlborg, O. Phenotypic evolution from genetic polymorphisms in a radial network architecture. BMC Biol. 5, 50 (2007). 13. Hansen, T. F. Why epistasis is important for selection and adaptation. Evolution (N. Y). 67, 3501–3511 (2013). 14. Weinreich, D. M., Lan, Y., Wylie, C. S. & Heckendorn, R. B. Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. 23, 700–707 (2013). 15. Wright, S. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proceedings of the Sixth International Congress on Genetics 1, 356– 366 (1932). 16. Gibson, G. & Dworkin, I. Uncovering cryptic genetic variation. Nat. Rev. Genet. 5, 681–690 (2004). 17. Paaby, A. B. & Rockman, M. V. Cryptic genetic variation: evolution’s hidden substrate. Nat. Rev. Genet. 15, 247–58 (2014). 18. Taylor, M. B., Phan, J., Lee, J. T., McCadden, M. & Ehrenreich, I. M. Diverse genetic architectures lead to the same cryptic phenotype in a yeast cross. Nat. Commun. 7, 11669 (2016). 19. Rutherford, S. L. & Lindquist, S. Hsp90 as a capacitor for morphological evolution. Nature 396, 336–342 (1998). 20. Rohner, N. et al. Cryptic variation in morphological evolution: HSP90 as a capacitor for loss of eyes in cavefish. Science (80-. ). 342, 1372–1375 (2013).

42 21. Queitsch, C., Sangster, T. a & Lindquist, S. Hsp90 as a capacitor of phenotypic variation. Nature 417, 618–24 (2002). 22. Box, G. E. P., Draper, N. R. & others. Empirical model-building and response surfaces. 424, (Wiley New York, 1987). 23. Falconer, D. S. & Mackay, T. F. C. Introduction to quantitative genetics. Harlow: Longman (1996). 24. Lynch, M., Walsh, B. & others. Genetics and analysis of quantitative traits. 1, (Sinauer Sunderland, MA, 1998). 25. Huang, W. & Mackay, T. F. C. The Genetic Architecture of Quantitative Traits Cannot Be Inferred From Variance Component Analysis. bioRxiv 41434 (2016). doi:10.1101/041434 26. Lehner, B. Molecular mechanisms of epistasis within and between genes. Trends Genet. 27, 323–331 (2011). 27. Cheverud, J. M. & Routman, E. J. Epistasis and its contribution to genetic variance components. Genetics 139, 1455–61 (1995). 28. Alvarez-Castro, J. M. & Carlborg, O. A unified model for functional and statistical epistasis and its application in quantitative trait Loci analysis. Genetics 176, 1151–67 (2007). 29. Bloom, J. S. et al. Genetic interactions contribute less than additive effects to quantitative trait variation in yeast. Nat. Commun. 6, 8712 (2015). 30. Rönnegård, L. & Valdar, W. Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability. BMC Genet. 13, 63 (2012). 31. Waddington, C. H. Canalization of development and the inheritance of acquired characters. Nature 3811, 563–565 (1942). 32. Hill, W. G. & Mulder, H. a. Genetic analysis of environmental variation. Genet. Res. (Camb). 92, 381–95 (2010). 33. Shen, X., Pettersson, M., Rönnegård, L. & Carlborg, Ö. Inheritance beyond plain heritability: variance-controlling genes in Arabidopsis thaliana. PLoS Genet. 8, e1002839 (2012). 34. Ayroles, J. F. et al. Behavioral idiosyncrasy reveals genetic control of phenotypic variability. Proc. Natl. Acad. Sci. 112, 201503830 (2015). 35. Rönnegård, L. & Valdar, W. Detecting major genetic loci controlling phenotypic variability in experimental crosses. Genetics 188, 435–47 (2011). 36. Paré, G., Cook, N. R., Ridker, P. M. & Chasman, D. I. On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women’s Genome Health Study. PLoS Genet. 6, e1000981 (2010). 37. Struchalin, M. V, Dehghan, A., Witteman, J. C., van Duijn, C. & Aulchenko, Y. S. Variance heterogeneity analysis for detection of potentially interacting genetic loci: method and its limitations. BMC Genet. 11, 92 (2010). 38. Struchalin, M. V, Amin, N., Eilers, P. H. C., van Duijn, C. M. & Aulchenko, Y. S. An R package ‘VariABEL’ for genome-wide searching of potentially interacting loci by testing genotypic variance heterogeneity. BMC Genet. 13, 4 (2012). 39. Mitchell-Olds, T. Arabidopsis thaliana and its wild relatives: A model system for ecology and evolution. Trends Ecol. Evol. 16, 693–700 (2001). 40. Hancock, A. M. et al. Adaptation to Climate Across the Arabidopsis thaliana Genome. Science (80-. ). 334, 83–86 (2011). 41. Horton, M. W. et al. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat. Genet. 44, 212–6 (2012).

43 42. Alonso-Blanco, C. et al. 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana. Cell 166, 481–491 (2016). 43. Stroud, H. et al. Non-CG methylation patterns shape the epigenetic landscape in Arabidopsis. Nat. Struct. Mol. Biol. 21, 64–72 (2014). 44. Zemach, A. et al. The Arabidopsis nucleosome remodeler DDM1 allows DNA methyltransferases to access H1-containing heterochromatin. Cell 153, 193–205 (2013). 45. Campbell, C. D. et al. Demonstrating stratification in a European American population. Nat. Genet. 37, 868–872 (2005). 46. Helgason, A., Yngvadóttir, B., Hrafnkelsson, B., Gulcher, J. & Stefánsson, K. An Icelandic example of the impact of population structure on association studies. Nat. Genet. 37, 90–5 (2005). 47. Hermisson, J. & Pennings, P. S. Soft sweeps: Molecular population genetics of adaptation from standing genetic variation. Genetics 169, 2335–2352 (2005). 48. Fall, T., Hamlin, H. H., Hedhammar, A., Kämpe, O. & Egenvall, A. Diabetes mellitus in a population of 180,000 insured dogs: incidence, survival, and breed distribution. J. Vet. Intern. Med. 21, 1209–16 (2007). 49. Lequarré, A.-S. et al. LUPA: a European initiative taking advantage of the canine genome architecture for unravelling complex disorders in both human and dogs. Vet. J. 189, 155–9 (2011). 50. Yamauchi, T. et al. A genome-wide association study in the Japanese population identifies susceptibility loci for type 2 diabetes at UBE2E2 and C2CD4A-C2CD4B. Nat. Genet. 42, 864–868 (2010). 51. Kooner, J. S. et al. Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci. Nat. Genet. 43, 984–989 (2011). 52. Cho, Y. S. et al. Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nat. Genet. 44, 67–72 (2012). 53. Kaiser, B. N., Gridley, K. L., Ngaire Brady, J., Phillips, T. & Tyerman, S. D. The role of molybdenum in agricultural plant production. Ann. Bot. 96, 745–54 (2005). 54. Botstein, D., Chervitz, S. A. & Cherry, M. Yeast as a Model Organism. Science (80-. ). 277, 1259–1260 (1997). 55. Bergman, A. & Siegal, M. L. Evolutionary capacitance as a general feature of complex gene networks. Nature 424, 549–552 (2003). 56. Paaby, A. & Gibson, G. Cryptic Genetic Variation in Evolutionary Developmental Genetics. Biology (Basel). 5, 28 (2016). 57. Le Rouzic, A. & Carlborg, Ö. Evolutionary potential of hidden genetic variation. Trends Ecol. Evol. 23, 33–37 (2008). 58. Carlborg, O., Jacobsson, L., Ahgren, P., Siegel, P. & Andersson, L. Epistasis and the release of genetic variation during long-term selection. Nat. Genet. 38, 418–20 (2006). 59. Gerke, J., Lorenz, K. & Cohen, B. Genetic interactions between factors cause natural variation in yeast. Science (80-. ). 323, 498–501 (2009). 60. Nelson, R. M., Pettersson, M. E. & Carlborg, Ö. A century after Fisher: time for a new paradigm in quantitative genetics. Trends Genet. 1–8 (2013). doi:10.1016/j.tig.2013.09.006 61. Richards, E. J. Inherited epigenetic variation--revisiting soft inheritance. Nat. Rev. Genet. 7, 395–401 (2006). 62. Schmutz, S. M. MC1R Studies in Dogs With Melanistic Mask or Brindle Patterns. J. Hered. 94, 69–73 (2003).

44 63. Vaisse, C. et al. Melanocortin-4 receptor mutations are a frequent and heterogeneous cause of morbid obesity. 253–262 (2000). 64. Mencarelli, M. et al. Rare melanocortin-3 receptor mutations with in vitro functional consequences are associated with human obesity. Hum. Mol. Genet. 20, 392–9 (2011). 65. Garcia-Molina, A. et al. The arabidopsis COPT6 transport protein functions in copper distribution under copper-deficient conditions. Plant Cell Physiol. 54, 1378–1390 (2013). 66. Schwarz, G. Molybdenum cofactor biosynthesis and deficiency. Cell. Mol. Life Sci. 62, 2792–2810 (2005).

45 Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1278 Editor: The Dean of the Faculty of Medicine

A doctoral dissertation from the Faculty of Medicine, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-307837 2016