Modeling Linkage Disequilibrium 1 Increases Accuracy of Polygenic

Total Page:16

File Type:pdf, Size:1020Kb

Modeling Linkage Disequilibrium 1 Increases Accuracy of Polygenic 1 Modeling Linkage Disequilibrium 2 Increases Accuracy of Polygenic Risk 3 Scores 4 Bjarni J. Vilhjálmsson1,2,3,4,*, Jian Yang5,6, Hilary K. Finucane1,2,3,7, 5 Alexander Gusev1,2,3, Sara Lindström1,2, Stephan Ripke8,9, Giulio 6 Genovese3,8,10, Po-Ru Loh1,2,3, Gaurav Bhatia1,2,3, Ron Do11,12, Tristan 7 Hayeck1,2,3, Hong-Hee Won3,13, Schizophrenia Working Group of the 8 Psychiatric Genomics Consortium, the Discovery, Biology, and Risk of 9 Inherited Variants in Breast Cancer (DRIVE) study, Sekar Kathiresan3,13, 10 Michele Pato14, Carlos Pato14, Rulla Tamimi1,2,15, Eli Stahl16, Noah 11 Zaitlen17, Bogdan Pasaniuc18, Gillian Belbin11,12, Eimear Kenny11,12,19,20, 12 Mikkel H. Schierup4, Philip De Jager3,21,22 , Nikolaos A. Patsopoulos3,21,22, 13 Steve McCarroll3,8,10, Mark Daly3,8, Shaun Purcell15, Daniel Chasman23, 14 Benjamin Neale3,8, Michael Goddard24,25, Peter M. Visscher5,6, Peter 15 Kraft1,2,3,26, Nick Patterson3, Alkes L. Price1,2,3,26,* 16 17 18 1. Department of Epidemiology, Harvard T.H. Chan School of Public 19 Health, Boston, 02115 MA, USA. 20 2. Program in Genetic Epidemiology and Statistical Genetics, Harvard 21 T.H. Chan School of Public Health, Boston, 02115 MA, USA. 22 3. Program in Medical and Population Genetics, Broad Institute of 23 Harvard and MIT, Cambridge, 02142 MA, USA. 24 4. Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, 25 Denmark. 26 5. Queensland Brain Institute, The University of Queensland, 27 Brisbane, 4072 Queensland, Australia. 28 6. The Diamantina Institute, The Translational Research Institute, 29 University of Queensland, Brisbane, 4101 Queensland, Australia. 30 7. Department of Mathematics, Massachusetts Institute of Technology, 31 Cambridge, 02139 MA, USA. 32 8. Stanley Center for Psychiatric Research, Broad Institute of MIT and 33 Harvard, Cambridge, 02142 MA, USA. 34 9. Analytic and Translational Genetics Unit, Massachusetts General 35 Hospital, Boston, 02114 MA, USA. 36 10.Department of Genetics, Harvard Medical School, Boston, 02115 37 MA, USA. 38 11.The Charles Bronfman Institute of Personalized Medicine, The 39 Icahn School of Medicine at Mount Sinai, New York, NY, USA. 40 12.Department of Genetics and Genomic Sciences, The Icahn School of 41 Medicine at Mount Sinai, New York, NY, USA. 1 1 13.Cardiovascular Research Center, Massachusetts General Hospital, 2 Harvard Medical School, Boston, 02114 MA, USA. 3 14.Department of Psychiatry and Behavioral Sciences, Keck School of 4 Medicine at University of Southern California, Los Angeles, 90089 5 CA, USA. 6 15.Channing Division of Network Medicine, Brigham and Women’s 7 Hospital, Boston, MA, 02115 USA. 8 16.The Department of Psychiatry at Mount Sinai School of Medicine, 9 New York, 10029 NY, USA 10 17.Department of Medicine, Lung Biology Center, University of 11 California San Francisco, San Francisco, 94143 CA, USA. 12 18.Department of Pathology and Laboratory Medicine, University of 13 California Los Angeles, Los Angeles, 90095 CA, USA. 14 19.The Icahn Institute of Genomics and Multiscale Biology, The Icahn 15 School of Medicine at Mount Sinai, New York, NY, USA. 16 20.The Center of Statistical Genetics, The Icahn School of Medicine at 17 Mount Sinai, New York, NY, USA. 18 21.Department of Medicine, Harvard Medical School, Boston, 02115 19 MA, USA. 20 22.Program in Translational NeuroPsychiatric Genomics, Ann Romney 21 Center for Neurologic Diseases, Department of Neurology, Brigham 22 and Women’s Hospital, Boston, 02115 MA, USA 23 23.Division of Preventive Medicine, Brigham and Women’s Hospital, 24 Boston, 02215 MA, USA. 25 24.Department of Food and Agricultural Systems, University of 26 Melbourne, Parkville, 3010 Victoria, Australia. 27 25.Biosciences Research Division, Department of Primary Industries, 28 Bundoora, 3083 Victoria, Australia. 29 26.Department of Biostatistics, Harvard T.H. Chan School of Public 30 Health, Boston, 02115 MA, USA. 31 32 Correspondence should be addressed to B.J.V. 33 ([email protected]) or A.L.P. ([email protected]). 34 ∗ 35 36 37 38 39 40 41 42 43 2 1 Abstract: 2 Polygenic risk scores have shown great promise in predicting complex disease 3 risk, and will become more accurate as training sample sizes increase. The 4 standard approach for calculating risk scores involves LD-pruning markers 5 and applying a P-value threshold to association statistics, but this discards 6 information and may reduce predictive accuracy. We introduce a new method, 7 LDpred, which infers the posterior mean effect size of each marker using a 8 prior on effect sizes and LD information from an external reference panel. 9 Theory and simulations show that LDpred outperforms the 10 pruning/thresholding approach, particularly at large sample sizes. 11 Accordingly, prediction R2 increased from 20.1% to 25.3% in a large 12 schizophrenia data set and from 9.8% to 12.0% in a large multiple sclerosis 13 data set. A similar relative improvement in accuracy was observed for three 14 additional large disease data sets and when predicting in non-European 15 schizophrenia samples. The advantage of LDpred over existing methods will 16 grow as sample sizes increase. 17 18 Introduction 19 Polygenic risk scores (PRS) computed from genome-wide association study (GWAS) 20 summary statistics have proven valuable for predicting disease risk and 21 understanding the genetic architecture of complex traits. PRS were used to predict 22 genetic risk in a schizophrenia GWAS for which there was only one genome-wide 23 significant locus1 and have been widely used to predict genetic risk for many traits1- 24 15. PRS can also be used to draw inferences about genetic architectures within and 25 across traits12,13,16-18. As GWAS sample sizes grow the prediction accuracy of PRS 26 will increase and may eventually yield clinically actionable predictions16,19-21. 27 However, as noted in recent work19, current PRS methods do not account for effects 28 of linkage disequilibrium (LD), which limits their predictive value, especially for 29 large samples. Indeed, our simulations show that, in the presence of LD, the 30 prediction accuracy of the widely used approach of LD-pruning followed by P-value 31 thresholding1,6,8,9,12,13,15,16,19,20 falls short of the heritability explained by the SNPs 32 (Figure 1 and Supplementary Figure 1; see Materials and Methods). 33 34 One possible solution to this problem is to use one of the many available prediction 35 methods that require genotype data as input, including genomic BLUP—which 36 assumes an infinitesimal distribution of effect sizes—and its extensions to non- 37 infinitesimal mixture priors22-29. However, these methods are not applicable to 38 GWAS summary statistics when genotype data are unavailable due to privacy 39 concerns or logistical constraints, as is often the case. In addition, many of these 40 methods become computationally intractable at the very large sample sizes (>100K 41 individuals) that would be required to achieve clinically relevant predictions for 42 most common diseases16,19,20. 43 3 1 In this study we propose a Bayesian polygenic risk score, LDpred, which estimates 2 posterior mean causal effect sizes from GWAS summary statistics assuming a prior 3 for the genetic architecture and LD information from a reference panel. By using a 4 point-normal mixture prior26,30 for the marker effects, LDpred can be applied to 5 traits and diseases with a wide range of genetic architectures. Unlike LD-pruning 6 followed by P-value thresholding, LDpred has the desirable property that its 7 prediction accuracy converges to the heritability explained by the SNPs as sample 8 size grows (see below). Using simulations based on real genotypes we compare the 9 prediction accuracy of LDpred to the widely used approach of LD-pruning followed 10 by P-value thresholding1,6,8,9,12,13,15,16,19,20,31, as well as other approaches that train on 11 GWAS summary statistics. We apply LDpred to seven common diseases for which 12 raw genotypes are available in small sample size, and to five common diseases for 13 which only summary statistics are available in large sample size. 14 Materials and Methods 15 Overview of Methods 16 LDpred calculates the posterior mean effects from GWAS summary statistics 17 conditional on a genetic architecture prior and LD information from a reference 18 panel. The inner product of these re-weighted effect sizes with test sample 19 genotypes is the posterior mean phenotype and thus, under the model assumptions 20 and available data, an optimal (minimum variance and unbiased) predictor32. The 21 prior for the effect sizes is a point-normal mixture distribution, which allows for 22 non-infinitesimal genetic architectures. The prior has two parameters, the 23 heritability explained by the genotypes, and the fraction of causal markers (i.e. the 24 fraction of markers with non-zero effects). The heritability parameter is estimated 25 from GWAS summary statistics, accounting for sampling noise and LD33-35 (see 26 details below). The fraction of causal markers is allowed to vary and can be 27 optimized with respect to prediction accuracy in a validation data set, analogous to 28 how LD-pruning followed by P-value thresholding (P+T) is applied in practice. 29 Hence, similar to P+T, where P-value thresholds are varied and multiple PRS are 30 calculated, multiple LDpred risk scores are calculated using priors with varying 31 fractions of markers with non-zero effects. The value optimizing prediction 32 accuracy can then be determined in an independent validation data set. We 33 approximate LD using data from a reference panel (e.g. independent validation 34 data). The posterior mean effect sizes are estimated via Markov Chain Monte Carlo 35 (MCMC), and applied to validation data to obtain polygenic risk scores.
Recommended publications
  • Response to Selection in Finite Locus Models with Nonadditive Effects
    Journal of Heredity, 2017, 318–327 doi:10.1093/jhered/esw123 Original Article Advance Access publication January 12, 2017 Original Article Response to Selection in Finite Locus Models with Nonadditive Effects Hadi Esfandyari, Mark Henryon, Peer Berg, Jørn Rind Thomasen, Downloaded from https://academic.oup.com/jhered/article/108/3/318/2895120 by guest on 23 September 2021 Piter Bijma, and Anders Christian Sørensen From the Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University DK-8830 Tjele, Denmark (Esfandyari, Berg, Thomasen, and Sørensen); Seges, Danish Pig Research Centre, Copenhagen, Denmark (Henryon); School of Animal Biology, University of Western Australia, Crawley, Australia (Henryon); Nordic Genetic Resource Center, Ås, Norway (Berg); VikingGenetics, Assentoft, Denmark (Thomasen); and Animal Breeding and Genomics Centre, Wageningen University, Wageningen, The Netherlands (Bijma). Address correspondence to H. Esfandyari at the address above, or e-mail: hadi.esfandyari@ mbg.au.dk. Received May 18, 2016; First decision June 7, 2016; Accepted January 2, 2017. Corresponding Editor: C Scott Baker Abstract Under the finite-locus model in the absence of mutation, the additive genetic variation is expected to decrease when directional selection is acting on a population, according to quantitative-genetic theory. However, some theoretical studies of selection suggest that the level of additive variance can be sustained or even increased when nonadditive genetic effects are present. We tested the hypothesis that finite-locus models with both additive and nonadditive genetic effects maintain more additive genetic variance (VA) and realize larger medium- to long-term genetic gains than models with only additive effects when the trait under selection is subject to truncation selection.
    [Show full text]
  • Directional Selection and Variation in Finite Populations
    Copyright 0 1987 by the Genetics Society of America Directional Selection and Variation in Finite Populations Peter D. Keightley and ‘William G. Hill Institute of Animal Genetics, University of Edinburgh, West Mains Road, Edinburgh EH9 3JN, Scotland Manuscript received April 12, 1987 Revised copy accepted July 16, 1987 ABSTRACT Predictions are made of the equilibrium genetic variances and responses in a metric trait under the joint effects of directional selection, mutation and linkage in a finite population. The “infinitesimal model“ is analyzed as the limiting case of many mutants of very small effect, otherwise Monte Carlo simulation is used. If the effects of mutant genes on the trait are symmetrically distributed and they are unlinked, the variance of mutant effects is not an important parameter. If the distribution is skewed, unless effects or the population size is small, the proportion of mutants that have increasing effect is the critical parameter. With linkage the distribution of genotypic values in the population becomes skewed downward and the equilibrium genetic variance and response are smaller as disequilibrium becomes important. Linkage effects are greater when the mutational variance is contributed by many genes of small effect than few of large effect, and are greater when the majority of mutants increase rather than decrease the trait because genes that are of large effect or are deleterious do not segregate for long. The most likely conditions for ‘‘Muller’s ratchet” are investi- gated. N recent years there has been much interest in the is attained more quickly in the presence of selection I production and maintenance of variation in pop- than in its absence, and is highly dependent on popu- ulations by mutation, stimulated by the presence of lation size.
    [Show full text]
  • Selection Response in Finite Populations
    Copyright 0 1996 by the Genetics Society of America Selection Responsein Finite Populations Ming Wei, Armando Caballero and William G. Hill Institute of Cell, Animal and Population Biology, University of Edinburgh, King’s Buildings, Edinburgh EH9 3JT, United Kingdom Manuscript received November 28, 1995 Accepted for publication September 5, 1996 ABSTRACT Formulae were derived to predict genetic response under various selection schemes assuming an infinitesimal model. Accountwas taken of genetic drift, gametic (linkage) disequilibrium (Bulmereffect), inbreeding depression, common environmental variance, and both initial segregating variance within families (a;wo)and mutational (&) variance. The cumulative response to selection until generation t(C8) can be approximated as where N, is the effective population size, = Ne& is the genetic variancewithin families at the steady state (or one-half the genic variance, which is unaffected by selection), and D is the inbreeding depression per unit of inbreeding. & is the selection response at generation 0 assuming preselection so that the linkage disequilibrium effect has stabilized. P is the derivative of the logarithm of the asymptotic response with respect to the logarithm of the within-family genetic variance, ie., their relative rate of change. & is the major determinant of the short term selection response, but aL, Ne and p are also important for the long term. A selection method of high accuracy using family information gives a small N, and will lead to a larger- response in the short term and a smaller response in the long term, utilizing mutation less efficiently. ENETIC response to one cycle of selection for a 1975) using all information available are formally the G quantitative character is solely a function of selec- methods to achieve the maximum response for one tion accuracy (the correlation between the selection generation of selection.
    [Show full text]
  • The Qtn Program and the Alleles That Matter for Evolution: All That’S Gold Does Not Glitter
    PERSPECTIVE doi:10.1111/j.1558-5646.2011.01486.x THE QTN PROGRAM AND THE ALLELES THAT MATTER FOR EVOLUTION: ALL THAT’S GOLD DOES NOT GLITTER Matthew V. Rockman1,2 1Department of Biology and Center for Genomics and Systems Biology, New York University, 12 Waverly Place, New York, NY 10003 2E-mail: [email protected] Received April 20, 2011 Accepted September 30, 2011 The search for the alleles that matter, the quantitative trait nucleotides (QTNs) that underlie heritable variation within populations and divergence among them, is a popular pursuit. But what is the question to which QTNs are the answer? Although their pursuit is often invoked as a means of addressing the molecular basis of phenotypic evolution or of estimating the roles of evolutionary forces, the QTNs that are accessible to experimentalists, QTNs of relatively large effect, may be uninformative about these issues if large-effect variants are unrepresentative of the alleles that matter. Although 20th century evolutionary biology generally viewed large-effect variants as atypical, the field has recently undergone a quiet realignment toward a view of readily discoverable large- effect alleles as the primary molecular substrates for evolution. I argue that neither theory nor data justify this realignment. Models and experimental findings covering broad swaths of evolutionary phenomena suggest that evolution often acts via large numbers of small-effect polygenes, individually undetectable. Moreover, these small-effect variants are different in kind, at the molecular level, from the large-effect alleles accessible to experimentalists. Although discoverable QTNs address some fundamental evolutionary questions, they are essentially misleading about many others.
    [Show full text]
  • Alleles Versus Mutations: Understanding the Evolution of Genetic Architecture Requires a Molecular Perspective on Allelic Origins
    Alleles versus mutations: Understanding the evolution of genetic architecture requires a molecular perspective on allelic origins By: David L. Remington Remington, D. L. (2015). Alleles versus mutations: understanding the evolution of genetic architecture requires a molecular perspective on allelic origins. Evolution: international journal of organic evolution. doi:10.1111/evo.12775 "This is the peer reviewed version of the following article: Remington, D. L. (2015). Alleles versus mutations: understanding the evolution of genetic architecture requires a molecular perspective on allelic origins. Evolution: international journal of organic evolution. doi:10.1111/evo.12775, which has been published in final form at http://dx.doi.org/10.1111/evo.12775. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving." Abstract: Perspectives on the role of large-effect quantitative trait loci (QTL) in the evolution of complex traits have shifted back and forth over the past few decades. Different sets of studies have produced contradictory insights on the evolution of genetic architecture. I argue that much of the confusion results from a failure to distinguish mutational and allelic effects, a limitation of using the Fisherian model of adaptive evolution as the lens through which the evolution of adaptive variation is examined. A molecular-based perspective reveals that allelic differences can involve the cumulative effects of many mutations plus intragenic recombination, a model that is supported by extensive empirical evidence. I discuss how different selection regimes could produce very different architectures of allelic effects under a molecular-based model, which may explain conflicting insights on genetic architecture from studies of variation within populations versus between divergently selected populations.
    [Show full text]
  • 100 Years of Quantitative Genetics Theory and Its Applications: Celebrating the Centenary of Fisher 1918
    100 years of quantitative genetics theory and its applications: celebrating the centenary of Fisher 1918 Nick Barton Institute for Science and Technology, Klosterneuburg, Austria The infinitesimal model In his 1918 paper, Fisher introduced the analysis of variance, and reconciled Mendelian genetics with continuous biometric variation. He also showed that when variation is due to very many additive genetic loci, selection hardly changes the genetic variance. This has come to be known as the “infinitesimal model”, and is the foundation for practical animal breeding. Yet, organisms are plainly not simply the sum of their genes, and so it is not clear when the use of the infinitesimal model can be justified. I review the history of the infinitesimal model, and show how it can be extended to include dominance, epistasis, and linkage. Sharon Browning University of Washington, Seattle, Washington, USA Identity by descent and the correlation between distant relatives Identity by descent and the correlation between distant relatives Related individuals have correlated trait values because they share parts of their genomes identical by descent. Because the genome is inherited in large chunks, punctuated by crossovers, the identity by descent (IBD) occurs in long segments. IBD segments inherited from more distant ancestors tend to be shorter due to more opportunities for crossing over. So-called “unrelated” individuals are in fact distant relatives, and share short segments of IBD. I will describe the detection of segments of IBD in population samples using genome-wide genetic data, and the application of these segments to IBD mapping and to estimating heritability. Edward Buckler USDA-ARS, Ithaca, New York, USA How to get to Plant Breeding 4.0, given that Fisher was right? As global population continues to grow and standards of living rise, the world food supply needs to almost double protein production.
    [Show full text]
  • Animal Breeding from Infinitesimal Model to Mas: the Case of a Backcross Design in Dairy Sheep (Sarda X Lacaune) and Its Possible Impact on Selection
    MARKER ASSISTED SELECTION: A FAST TRACK TO INCREASE GENETIC GAIN IN PLANT AND ANIMAL BREEDING? SESSION II: MAS IN ANIMALS ANIMAL BREEDING FROM INFINITESIMAL MODEL TO MAS: THE CASE OF A BACKCROSS DESIGN IN DAIRY SHEEP (SARDA X LACAUNE) AND ITS POSSIBLE IMPACT ON SELECTION G. Pagnacco and A. Carta [email protected] [email protected] Summary Despite the great advancements achieved by quantitative techniques, selection in some species and for some traits, as in dairy sheep, can be very expensive and requires great efforts. In this paper we review the options offered by quantitative methods and by marker-assisted selection and we discuss the possible impact of a QTL detection design based on a backcross experimental population (Sarda X Lacaune) on sheep breeding. Keywords Infinitesimal model, Quantitative Trait Loci, Marker-assisted selection, dairy sheep, Backcross Design Abstract Selection under the infinitesimal model Standard animal breeding techniques have been largely successful in improving the performance of domestic animals in the last century. The success of these techniques finds its roots in the theoretical work of Fisher and Wright who showed how selection, migration and mutation could be derived from the simple Mendelian laws. All that theory was then transferred into the operational application in animal breeding by J. L. Lush (Animal Breeding Plans, 1945) and by his many “sons” and “grandsons”. In more recent times another landmark book, Introduction to Quantitative Genetics, (Falconer, 1960) probably represents the official birth of the quantitative genetics in plant and especially animal breeding. Plant and animal breeding have based their tools of investigation on this “central paradigm for the analysis of phenotypic variation” (Lynch and Walsh, 1998) even if their methods have radically diverged probably because of the different population structure in crop and domestic animals.
    [Show full text]
  • Fisher's Infinitesimal Model: a Story for the Ages
    Theoretical Population Biology 118 (2017) 46–49 Contents lists available at ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb Commentary Commentary: Fisher's infinitesimal model: A story for the ages Michael Turelli Department of Evolution and Ecology, University of California, Davis, CA, USA article info a b s t r a c t Article history: Mendel (1866) suggested that if many heritable ``factors'' contribute to a trait, near-continuous variation Received 18 September 2017 could result. Fisher (1918) clarified the connection between Mendelian inheritance and continuous trait Available online 5 October 2017 variation by assuming many loci, each with small effect, and by informally invoking the central limit theorem. Barton et al. (2017) rigorously analyze the approach to a multivariate Gaussian distribution of Keywords: the genetic effects for descendants of parents who may be related. This commentary distinguishes three Polygenic traits Gaussian approximations nested approximations, referred to as ``infinitesimal genetics,'' ``Gaussian descendants'' and ``Gaussian Selection response population,'' each plausibly called ``the infinitesimal model.'' The first and most basic is Fisher's ``infinites- Additive effects imal'' approximation of the underlying genetics – namely, many loci, each making a small contribution to Epistasis the total variance. As Barton et al. (2017) show, in the limit as the number of loci increases (with enough Linkage disequilibria additivity), the distribution of genotypic values
    [Show full text]
  • The Infinitesimal Model: Definition, Derivation, and Implications Nick Barton, Alison Etheridge, Amandine Véber
    The infinitesimal model: definition, derivation, and implications Nick Barton, Alison Etheridge, Amandine Véber To cite this version: Nick Barton, Alison Etheridge, Amandine Véber. The infinitesimal model: definition, derivation, and implications. Theoretical Population Biology, Elsevier, 2017. hal-02113158 HAL Id: hal-02113158 https://hal.archives-ouvertes.fr/hal-02113158 Submitted on 27 Apr 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The infinitesimal model: definition, derivation, and implications N H Barton∗ Institute of Science and Technology Am Campus I A-3400 Klosterneuberg Austria Email [email protected] A M Etheridgey Department of Statistics University of Oxford 24{29 St Giles Oxford OX1 3LB, UK A V´eberz Centre de Math´ematiquesAppliqu´ees Ecole´ Polytechnique Route de Saclay 91128 Palaiseau Cedex France May 16, 2017 Abstract Our focus here is on the infinitesimal model. In this model, one or several quantitative traits are described as the sum of a genetic and a non-genetic component, the first being distributed within families as a normal random variable centred at the average of the parental genetic components, and with a variance independent of the parental traits.
    [Show full text]
  • 100 Years of Quantitative Genetics Theory and Its Applications: Celebrating the Centenary of Fisher 1918”
    Report on the meeting “100 years of quantitative genetics theory and its applications: celebrating the centenary of Fisher 1918” October 9, 2018 Wolfson Hall, Royal College of Surgeons, Nicolson St., Edinburgh EH8 9DW Sponsored by The Fisher Memorial Trust, the Genetics Society, the Galton Institute, the London Mathematical Society and the Royal Statistical Society Jessica G. King Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh Ronald A. Fisher’s 1918 paper, entitled “The correlation between relatives on the supposition of Mendelian inheritance” and published in the Transactions of the Royal Society of Edinburgh, set the foundations for the study of the genetics of quantitative traits. 100 years later, we celebrate Fisher’s contribution and reflect on the advances made since this classical paper first emerged. Context Prior to R.A. Fisher’s famous contribution, the genetic basis of evolutionary change was vigorously disputed between biometricians and Mendelians. In support of Darwin’s theory of evolution by natural selection, biometricians believed evolution to be a continuous process, having developed much of modern statistical methods such as regression and correlation to describe the inheritance of biometric (continuous or quantitative) traits. After the rediscovery of Mendel’s work on inheritance, the Mendelians argued against these views by vehemently supporting discontinuous evolution via Mendelian (discontinuous) traits controlled by the segregation of major genetic factors. The first attempts to reconcile the two opposing schools of thought were made independently by George Udny Yule in 1902 and Wilhelm Weinberg in 1910, whose studies were largely overlooked by both biometricians and Mendelians, blinded by the ongoing conflict.
    [Show full text]
  • An Infinitesimal Model for Quantitative Trait Genomic Value Prediction
    An Infinitesimal Model for Quantitative Trait Genomic Value Prediction Zhiqiu Hu1, Zhiquan Wang2, Shizhong Xu1* 1 Department of Botany and Plant Sciences, University of California Reverside, Reverside, California, United States of America, 2 Department of Agricultural Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada Abstract We developed a marker based infinitesimal model for quantitative trait analysis. In contrast to the classical infinitesimal model, we now have new information about the segregation of every individual locus of the entire genome. Under this new model, we propose that the genetic effect of an individual locus is a function of the genome location (a continuous quantity). The overall genetic value of an individual is the weighted integral of the genetic effect function along the genome. Numerical integration is performed to find the integral, which requires partitioning the entire genome into a finite number of bins. Each bin may contain many markers. The integral is approximated by the weighted sum of all the bin effects. We now turn the problem of marker analysis into bin analysis so that the model dimension has decreased from a virtual infinity to a finite number of bins. This new approach can efficiently handle virtually unlimited number of markers without marker selection. The marker based infinitesimal model requires high linkage disequilibrium of all markers within a bin. For populations with low or no linkage disequilibrium, we develop an adaptive infinitesimal model. Both the original and the adaptive models are tested using simulated data as well as beef cattle data. The simulated data analysis shows that there is always an optimal number of bins at which the predictability of the bin model is much greater than the original marker analysis.
    [Show full text]
  • Fisher Infinitesimal Model, Orr Adaptive “Walk” Model and Biometrical Studies of Carotenoid Pathway of Carrots1
    FISHER INFINITESIMAL MODEL, ORR ADAPTIVE “WALK” MODEL AND BIOMETRICAL STUDIES OF CAROTENOID PATHWAY OF CARROTS1 Carlos Antonio Fernandes Santos2 Key words: QTL, heritability, minimum number of genes INTRODUCTION Under the classic infinitesimal model, introduced by Fisher (1930), the character is determined by an infinite number of unlinked and nonepistatic loci, each with an infinitesimal effect, with the both assumption of normality and linearity. Most models of selection response are based on the either explicit or implicit assumption that the infinitesimal model adequately describes the underlying dynamics (Walsh, 2000). According to the model, genetic progress in an infinitely large population under constant selection intensity is predicted to be linear with time, with infinitesimal change in the allele frequencies and genetic variance. Although clearly incorrect as a description of biology, the infinitesimal model has provided reasonably accurate predictions of response to selection over moderate to long periods in large population (Gibson, 1999). Orr (1998), based on QTL results and heuristic and approximated and mathematical derivation, suggested that the distribution of factors during adaptation assumes a pleasingly simple exponential form and that an exponential trend among factors fixed appears to be a general property of adaptation toward a fixed optimum. According to the author, the stepwise approach or an adaptive “walk” to the optimum involves several to many factors substitutions. So, after the first factor gets fixed the population moves closer to the optimum and thus the next factor fixed will, on average, have a smaller effect. The third factor in turn will typically be smaller than the second, and so on. Carrot estimations for QTLs, minimum number of genes and heritabilities are presented in this study in order to present some highlights about these two models, to propose approaches to deal with this debate and also to make comments about practical implication related with engineering of quantitative traits.
    [Show full text]